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PREFACE 

The  present  work  is  based  upon  the  lectures  which  I  have 
delivered,  usually  in  alternate  years,  at  Harvard  University. 
It  is  not  intended  primarily  as  a  contribution  to  mathematical 
science,  but  as  a  text-book  introductory  to  a  branch  of 
mathematics  which  has  assumed  an  unexpected  importance 
in  recent  times. 

There  are  plenty  of  good  books  dealing  with  the  theory  of 
mathematical  probability.  In  French  we  have  the  beautiful 
treatises  of  Bertrand  *  and  Poincare  f — the  former  reads  like 
a  romance,  the  latter  has  much  of  the  originality  and  brilliance 
characteristic  of  the  master — as  well  as  the  text  of  Borel  J 
on  the  same  high  level,  to  say  nothing  of  others  of  less 
note.  In  German  there  is,  first  of  all,  the  encyclopaedic  but 
readable  text  of  Czuber,§  the  translation  of  Markhoft',  ||  with 
its  unusual  attention  to  rigour,  as  well  as  several  others. 
In  Italian  there  is  the  recent  work  of  Castelnuovo,^  careful, 
critical,  and  judicious.  How  is  it  in  English  ?  There  is 
only  one  recent  text-book,**  that  of  Fisher,  very  full  in  its 
treatment  of  statistics  and  frequency  curves,  but  omitting 
many  of  the  most  important  parts  of  the  subject.  The  striking 
book  on  probability  by  Keynes  ff  is  purely  philosophical  in 

*  Calcul  des  probabiliies,  Paris,  1889. 
t  Calcxd  des prohabilites,  2nd  ed.,  Paris,  1912. 
J  Elements  de  la  theorie  des  prohabilites,  Paris,  1908. 
§  Wahrscheinlichkeitsrechnung,  2nd  ed.,  Leipzig,  1908. 
li   Wahrscheinlichkeitsrechnung,  Leipzig,  1912. 
%  Calcolo  delle  P>-obabilita,  Rome,  1919. 

**  Mathematical  Theory  of  Probabilities,  2nd  ed.,  New  York,  192  2. 
+t  Treatise  on  Probability,  London,  1921. 
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VI  PREFACE 

interest,   inclining   to  the   thesis   that   probability   is   not   a 
mathematical  subject  anyway. 

It  would,  of  course,  be  far  better  if  every  English-speaking 
reader  were  a  sufficient  master  of  foreign  languages  to  study 
all  of  these  excellent  texts,  but  such  is  manifestly  not  the 
case.  The  simple  fact  is  that  such  readers  absolutely  will 
not  make  the  linguistic  effort  necessary.  The  need  for  a  brief 
but  comprehensive  English  text  is  obvious,  if  regrettable. 

From  the  purely  mathematical  point  of  view,  the  calculus 
of  probabilities  is  somewhat  unsatisfactory.  To  begin  with, 
we  are  forced  to  use  approximate  formulae,  and  it  is  not 
always  easy  to  have  an  exact  knowledge  of  their  degree 
of  exactness,  at  least  without  arduous  calculations.  Then 
certain  fundamental  laws,  like  the  Gaussian  Law  of  Error, 
are  based  on  a  variety  of  so-called  proofs,  each  making  some 
very  broad  assumptions  of  doubtful  validity.  And  lastly, 
there  is  a  nasty  habit  of  developing  a  formula  under  the 
assumption  that  it  holds  for  a  very  limited  range,  and  then 
calculating  the  constants  by  computing  out  to  infinity.  For 
this  reason  the  mathematician  is  tempted  at  times  to  view 
the  whole  subject  with  distrust.  This  is  a  mistake.  How- 
ever the  formulae  may  be  derived,  they  frequently  prove 
remarkably  trustworthy  in  practice.  The  proper  attitude 
is  not  to  reject  laws  of  doubtful  origin,  but  to  scrutinize 
them  with  care,  with  a  view  to  reaching  the  true  principles 
underneath.  It  seems  to  me  that,  in  the  last  analysis, 
probability  is  a  statistical,  that  is  to  say,  an  experimental 
science,  and  the  mathematical  problem  is  to  establish  rules 
which  yield  correct  and  valuable  results. 

Perhaps  the  most  characteristic  feature  of  the  present  work 
is  that  the  statistical  definition  of  probability  is  adhered  to 
throughout.  This  has  been  done  in  philosophical  discussions, 
and  Castelnuovo  comes  very  near  to  adopting  it,  but  the 
usual  method  is  to  have  several  diflerent  definitions  of 
probability,  and  reconcile  them  tant  bien  que  mal. 
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As  a  matter  of  history,  the  calculus  of  probability  started 
with  the  study  of  games  of  chance.  The  present  book  does 
the  same.  Of  course,  this  branch  of  the  subject  is  not  the 
most  important  to-day,  but  in  studying  any  science  it  is  wise 
to  pay  some  attention  to  the  problems  that  gave  it  birth. 
Moreover,  from  a  didactic  point  of  view,  it  is  doubtful 
whether  the  plan  of  replacing  problems  in  games  of  chance 
by  problems  in  life  insurance  is  likely  to  increase  the  interest 
of  the  beginner.  On  the  other  hand,  the  tendency  which 
some  people  show  of  attempting  to  solve  all  problems  in 
probability  by  assimilating  them  to  drawing  balls  from  an 
urn  is  fundamentally  unsound,  as  it  departs  from  the  facts. 

The  subjects  of  mean  value  and  expectation,  which  have 
always  played  a  central  role  in  the  theory  of  probabilities, 
have  taken  on  additional  importance  in  recent  years,  owing 
to  the  idea  of  dispersion,  and  its  application  to  statistical 
series.  For  that  reason  they  have  been  given  a  good  deal  of 
prominence.  Per  contra,  geometrical  probability,  which  is 
little  more  than  a  plaything,  and  the  probability  of  causes, 
which  rests  on  very  shaky  foundations,  are  treated  briefly. 
Yet  they  should  not  be  omitted  entirely,  for  the  former  is 
related  to  statistical  mechanics,  and  the  latter  gives  the  only 
answers  we  have  to  certain  questions  which  recur  insistently. 

The  most  important  part  of  the  theory  is  that  which  deals 
with  the  distribution  of  errors  of  observation.  The  funda- 
mental question  here  is  what  to  do  with  the  exponential  law 
of  Gauss.  I  have  tried  to  make  it  as  plausible  as  I  could  by 
basing  it  on  very  broad  assumptions,  even  though  this  adds 
somewhat  to  the  length  of  th^e  deduction.  I  have,  however, 
given  the  principles  of  combining  observations  as  far  as 
possible  independently  of  the  Gaussian  law.  The  study  of 
errors  in  two  dimensions,  which  formerly  interested  few  but 
students  of  artillery  practice,  has  taken  on  a  new  importance 
through  its  relation  to  statistical  correlation. 

The  treatment  of  least  square  and  indirect  observations 
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follows  traditional  lines.  In  studying  the  application  of  least 
squares  to  curve  fitting,  I  have  briefly  explained  the  modern 
method  of  moments.  I  have  also  included  a  summary  treat- 
ment of  the  applications  of  probability  to  such  widely  diver- 
gent topics  as  the  kinetic  theory  of  gases  and  life  insurance. 

In  general,  it  has  been  my  idea  to  give  the  mathematical 
basis  underlying  each  of  the  important  applications  of  proba- 
bility rather  than  to  write  a  treatise  on  games  of  chance,  or 
errors  of  observation,  or  the  combination  of  measurements, 
or  statistics,  or  statistical  mechanics,  or  insurance. 

With  a  view  to  addinor  to  the  didactic  value  of  the  work 
I  have  introduced  a  certain  number  of  exercises  for  the 
student.  It  would  be  easy  to  multiply  these  indefinitely. 
The  few  which  I  have  chosen  seemed  to  me  particularly 
interesting,  but  that,  perhaps,  is  a  matter  of  individual 
preference.  The  tea^jher  or  student  will  find  little  difficulty 
in  adding  to  the  number. 

Paragraphs  marked  ^  are  more  difficult  than  the  others, 
and  may  well  be  omitted  by  the  beginner. 

There  is  little  need  for  elaborate  bibliographical  notes  as 
Czuber's  comprehensive  report,*  though  not  entirely  free  from 
mistakes,  covers  the  ground  thoroughly. 

J.  L.  C. 

Cambridge,  U.S.A., 
December  1924. 

*  Die  Eniwickelung  der  Wahrscheinlichkeifsrechmmg  und  Hirer  Anivendu)igen. 
JaJireshericht  der  deutschen  Mathetnatikervereinigung,  Vol.  vii.  Part  2,  Leipzig,  1S99. 
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CHAPTER   I 

THE   SCOPE   AND  MEANING  OF  MATHEMATICAL 

PROBABILITY 

Everybody  has  a  pretty  good  working  knowledge  of  the 
meaning  of  the  words  '  probable  *  and  '  improbable '.  If  a  man 
be  asked  :  *  Is  it  probable  that  the  sun  will  rise  to-morrow  ?  ' 
*  Is  it  probable  that  you  will  be  elected  next  Grand  Lama  of 
Tibet  ? '  he  knows  precisely  what  the  question  means,  and 
is  able  to  answer  without  hesitation.  That  is  because  the 
terms  are  used  in  a  general  sense,  without  any  attempt  at  the 
refinement  of  accuracy  needful  for  mathematical  purposes. 
In  exactly  the  same  way,  everybody  understands  the  state- 
ment that  Cap  Gris-Nez  is  the  nearest  point  in  France  to 
Great  Britain.  The  trouble  comes  when  we  undertake  to  say 
what  we  mean  by  a  mathematical  point,  and  in  like  manner 
we  encounter  serious  difficulty  when  we  try  to  express 
probability  in  exact  mathematical  language.  A  brilliant  con- 
temporary philosopher  has  defined  mathematics  as  the  science 
where  we  never  know  what  we  are  talking  about,  or  what 
our  results  mean ;  the  calculus  of  probability  is  no  exception 
to  this  pessimistic  definition. 

How  shall  probability  be  defined  as  a  mathematical  term  ? 
The  first  definition  which  we  have  to  consider,  and  which  is 
ascribed  to  James,  alias  Jacob,  Bernoulli,  is  that  probability 
is  the  measure  of  the  strength  of  our  expectation  of  a  future 
event.  If  we  feel  almost  sure  that  an  event  is  going  to 
happen,  we  say  that  it  is  highly  probable ;  in  the  contrary 
case,  we  call  it  highly  improbable ;  and  if  we  are  so  inclined, 
we  may  express  our  expectation  in  the  form  of  a  bet,  for  or 
against  the  arrival  of  the  event.  Probability  appears  as  the 
mathematical  measure  of  our  state  of  expectancy. 
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The  mere  statement  of  such  a  definition  is  sufficient  to 
raise  a  host  of  objections.  If,  as  the  statement  suggests, 
probability  is  merely  a  sort  of  psychological  coefficient,  the 
practical  application  of  it  should  be  in  the  psychological 
laboratory,  whei'e  it  could  be  measured  in  the  same  way  as 
reaction  time,  intensity  of  response  to  stimulus,  persistence  of 
illusion.  But  no  large  circle  of  persons  is  interested  in  any 
such  sort  of  probability  as  that.  Moreover,  different  persons 
will  assign  different  degrees  of  probability  to  the  same  future 
event,  and  the  same  person  will  feel  differently  about  it  at 
different  times,  according  to  his  mood,  or  the  state  of  his 
digestion.  In  consequence  of  this,  the  supporters  of  such 
a  definition  put  in  a  qualifying  adjective,  saying  that  prob- 
ability is  the  measure  of  our  *  intelligent '  expectation  of 
a  future  event.  This  raises  the  question,  *  What  is  intelligent 
expectation  ? '  The  answer  would  seem  to  be  that  it  is  the 
expectation  of  an  intelligent  person^  reasoning  on  the  facts  in 
the  case,  and  not  on  his  own  personal  hopes  and  fears.  But 
if  all  intelligent  persons  will  reach  the  same  degree  of 
expectancy  when  they  reason  intelligently  on  the  facts,  then 
the  measure  of  this  expectancy  must  be  a  function  of  the 
facts  themselves,  and  not  of  the  individual ;  who  may  thus 
be  left  out  of  account. 

A  lineal  successor  of  Bernoulli  has  appeared  in  very  recent 
times  in  Keynes,  whose  remarkable  book  was  referred  to  in  the 
preface.  This  writer's  main  thesis  is  that  probability  is  not 
concerned  with  events,  but  with  judgements  or  propositions.^ 
This  is  a  question  of  definition,  and  his  point  of  view  is 
certainly  legitimate.  But  a  science  of  the  probability  of 
judgements  can  scarcely  be  made  a  subject  of  exact  mathe- 
matical treatment,  and  this  also  is  one  of  Keynes's  principal 
contentions.f  His  method  is,  to  use  his  own  words :  J 
'  To  regard  subjective  probability  as  fundamental,  and  to 
treat  all  other  relative  conceptions  as  derivative  from  this.' 
It  is  perhaps  open  to  question  whether  he  has  entirely  answered 
the    difficulties   raised   against    Bernoulli,   but    in   any   case 

*  Loc.  cit.,  p»  5.  +  Ibid.,  p,  34. 

*  Ibid.,  p.  282. 
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it  is  perfectly  evident  that  a  line  of  reasoning  which  starts 
from  the  premiss  that  a  certain  subject  is  non-mathematical, 
is  not  a  good  introduction  to  a  mathematical  treatment  of 
that  subject. 

A  second  definition,  which  has  the  great  authority  of  John 
Stuart  Mil],  may  be  expressed  as  follows :  *  Let  us  suppose 
that  an  event  depends  upon  a  certain  nexus  of  causes,  each  of 
measurable  intensity.  We  measure  the  total  field  or  extent 
of  variation  of  these  causes,  and  take  this  as  denominator, 
while  as  numerator  we  take  the  measure  of  that  field  or 
extent  of  variation  which  will  produce  what  we  call  a 
favourable  outcome.  The  fraction  is  defined  as  the  iirohahilitij 
for  a  favourable  result.  For  instance,  if  a  coin  be  spun  in  the 
air,  there  is  an  equal  chance  that  it  will  turn  up  head  or  tail, 
because  the  initial  angular  velocities  of  spin  and  translational 
velocities  of  the  centre  of  gravity  which  will  cause  the  coin  to 
show  a  head,  cover  one-half  the  total  ranges  of  angular 
and  translational  velocities  which  need  be  considered  in  the 
particular  problem. 

The  difficulties  attending  such  a  definition  are  so  many  that 
it  is  scarcely  worth  consideration.  What  antecedent  events 
are  to  be  classed  as  causes  1  Are  there  not  diflferent  groups  of 
independent  events,  each  of  which  might  be  called  the  group 
of  causes,  and  how  do  we  know  that  the  result  will  be 
independent  of  the  choice  of  group?  Moreover,  it  is  by  no 
means  certain  that  the  calculation  will  yield  in  every  case 
the  number  which  the  uninitiated  would  designate  as  the 
probability  for  a  favourable  outcome.  Suppose  that  in  a 
certain  congres.-^ional  district  there  were  five  rock-ribbed 
Republicans  to  each  three  stalwart  Democrats.  The  forces 
tending  to  elect  a  Republican  congressnjan  would  seem  to 
bear  a  ratio  of  five  to  three  to  those  tending  to  elect  a 
Democrat,  but  we  should  hesitate  to  say  that  there  were 
only  five  chances  in  eight  that  the  successful  candidate  would 
be  a  Republican. 

*  Logic,  8tli  ed.,  vol.  ii,  p.  72,  London,  1872.  Mill  does  not  I'equire  all  prob- 
abilities to  come  under  this  head,  but  insists  that  we  have  here  the  real 
reason  for  the  persistence  of  statistical  ratios. 

B   2 
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We  now  come  to  tho  deliuition  of  probability  which  is 
used  in  piactically  all  mathematical  treatises  on  the  subject, 
and  which  can  be  put  into  the  following  words :  '  An  event 
can  happen  in  a  certain  number  of  ways,  which  are  all  equally 
likely.  A  certain  proportion  of  these  are  classed  as  favour- 
able.  The  ratio  of  the  number  of  favourable  ways  to  the  total 
number  is  called  the  probability  that  the  event  will  turn  out 
favourably.' 

This  definition  is  ever  so  much  more  clear-cut  than  the 
others,  and  is  capable  of  immediate  application  in  many  cases. 
Nevertheless  there  are  two  objections  which  may  be  easily 
raised.  The  first  is,  '  Exactly  what  is  meant  by  the  phrase 
"  equally  likely "  1 '  This  question  is  of  fundamental  im- 
portance in  our  subject,  and  must  be  treated  in  detail  later. 
The  other  criticism,  on  the  strength  of  which  we  reject  this 
definition,  is  that  as  it  stands  it  excludes  many  probabilities 
which  are,  nevertheless,  among  the  most  important.  As  stated 
above,  it  is  inapplicable  to  a  large  number  of  cases  where  the 
number  of  ways  is  infinite ;  also  it  excludes  the  whole  field 
of  statistical  probability.  If  we  ask  the  probability  that 
a  letter  in  the  lost-letter  office  contain  money,  we  are  asking 
a  concrete  and  intelligible  question,  but  to  base  the  answer 
on  the  number  of  ways  in  which  money  can  be  put  into 
or  left  out  of  a  misdirected  letter  would  be  the  height  of 
absurdity.  It  is  the  desire  to  include  all  forms  of  probability 
under  one  definition  that  leads  us  to  the  form  which  we  shall 
now  explain. 

First  empirical  assumption. 

If  an  event  which  can  happen  in  two  different  ways 

BE  repeated  a  great  NUMBER  OF  TIMES  UNDER  THE  SAME 
ESSENTIAL  CONDITIONS,  THE  RATIO  OF  THE  NUMBER  OF  TIMES 
THAT  IT  HAPPENS  IN  ONE  WAY,  TO  THE  TOTAL  NUMBER  OF 
TRIALS,  WILL  APPROACH  A  DEFINITE  LIMIT,  AS  THE  LATTER 
NUMBER  INCREASES   INDEFINITELY. 

Definition. 

The  limit  described  in  the  first  empirical  assumption  shall 
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be  called  the  2^'^^obability  that  the  event  shall  happen  in  the 
first  way,  under  those  conditions.* 

There  are  quite  as  many  objections  to  this  method  of 
defining  probability  as  to  smy  other,  and  we  must  set  about 
defending  it  as  best  we  can.  To  begin  with,  how  do  we  know 
that  the  first  empirical  assumption  is  true?  The  only  answer 
is  tliat  experience  in  many  fields  under  all  sorts  of  circum- 
stances has  demonstrated  its  truth.  Few  laws  of  nature  are 
so  well  established  as  this,  and  we  are  not  only  justified,  but 
compelled,  when  it  appears  to  be  at  fault,  to  examine  whether 
the  conditions  of  experiment  have  not  undergone  unnoticed 
but  important  alterations.  For  instance,  the  forms  of  inad- 
vertence which  cause  people  to  leave  their  umbrellas  in  public 
conveyances  are  many  and  various,  but  the  proportion  of 
umbrellas  to  the  total  n amber  of  travellers  in  any  particular 
locality  is  apparently  fairly  constant.  An  increase  of  marked 
amount  would  suggest  the  query  as  to  whether  the  weather 
had  not  been  unusually  bad  ;  a  notable  diminution  would 
suggest  either  good  weather  or  dishonest  employees. 

The  most  serious  difficulty  with  the  definition  is  the 
following  :  What  is  meant  by  'the  same  essential  conditions  '1 
No  two  experiments  are  ever  performed  under  identical 
conditions.  There  are  always  slight  changes  in  temperature, 
barometric  pressure,  the  state  of  the  experimenter's  digestion, 
the  chemical  composition  of  his  blood.  How  can  we  tell 
what  conditions  are  essential,  and  what  ones  are  not?  The 
objection  is  perfectly  valid,  and  does  not  admit  of  any  perfect 
rel'utation.     We  must   recognize,  however,  that   it  does  not 

*  Tliis  is  essentially  the  definition  used,  for  the  most  part,  in  KWVa  Logic, 
and  defended  in  great  detail  in  Venn's  Logic  of  Chance,  3id  ed.,  London,  1888. 
Neither  of  these  writers,  however,  draws  a  sufficiently  sharp  distinction 
between  a  ratio  and  the  limit  of  a  ratio.  A  much  more  accurate  statement 
will  be  found  in  an  article  by  Von  Mises,  'Grundlagen  der  Wahrscheinlich- 
keitsrechnung ',  Mathematische  Zeiischrift,  vol.  v,  pp.  53  if.  Keynes,  loc.  cit. , 
part  I,  eh.  viii,  attacks  it  with  the  utmost  vigour.  His  objections  seem  to  mo 
to  fall  completely  to  the  ground  if  one  considers  that  probability  has  to  do 
with  events,  not  with  judgements.  He  refuses  for  philosophical  reasons  to 
consider  the  probability  of  events,  he  scarcely  will  acknowledge  the  existence 
of  such  a  probability.  I  am  equally  sure  tliat  the  probability  of  events  is  the 
only  kind  worthy  of  serious  mathematical  study. 
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bear  on  the  present  question  alone,  but  on  the  whole  of 
experimental  science.  If  we  seek  to  determine  any  physical 
law  by  experimental  means,  wo  tacitly  assume  that  such 
changes  as  occur  in  the  conditions  aje  immaterial  to  the 
result.  Without  some  such  fundamental  postulate  all  experi- 
mental science  would  be  impossible.  Shall  the  psychologist, 
experimenting  on  the  sensitiveness  of  a  patient  to  certain 
faint  stimuli,  give  up  all  hope  of  learning  the  truth  because 
between  one  experiment  and  another  the  earth  will  have 
performed  a  certain  number  of  turns  about  its  axis,  and  will 
have  travelled  a  certain  distance  along:  its  orbit,  while  the 
whole  solar  system  will  have  made  a  certain  progress  through 
space  ?  Evidently  the  postulate  that  we  can  distinguish 
between  essential  and  unessential  conditions  lies  at  the  basis 
of  all  inductive  science,  and  cannot  be  charged  up  to  the 
calculus  of  probability. 

Another  criticism  which  can  be  levelled  against  our  definition 
is  the  following.  In  making  probability  merely  the  limit  of 
a  statistical  ratio,  we  exclude  the  possibility  of  ever  determining 
a  probability  except  as  the  result  of  a  long  series  of  experi- 
ments, and  even  then  we  could  only  determine  it  approximately. 
There  are  some  writers  who  are  frankly  willing  to  accept  this 
limitation,*  but  our  own  view  is  that  we  need  not  tie  our 
hands  quite  to  this  extent.  It  is  hard  to  base  the  probability 
that  a  card  drawn  at  random  from  a  pack  should  be  black 
upon  a  series  of  experiments  ad  hoc,  when  we  do  not  know 
wdiether  such  experiments  have  really  ever  been  performed. 
We  therefore  make  our 

Second  empirical  assumption. 

If  an  event  can  happen  in  a  certain  number  of  ways, 
all  of  which  are  equally  likely,  and  if  a  certain 
number  of  these  be  called  favourable,  then  the  ratio 
of  the  number  of  favourable  ways  to  the  total  number 
is  equal  to  the  probability  that  the  event  will  turn 
out  favourably. 

*  Venn,  loc.  cit.  Mill,  on  tlie  other  hand,  looks  upon  prol»abilities  deter- 
mined by  reasoning  as  moie  certain  than  tliose  dtlennined  statistically. 
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It  cannot  be  too  much  emphasized  that  this  is  an  empirical 
assumption  based  upon  experience,  exactly  like  the  other.  It 
tells  us  that  the  ratio  of  favourable  outcomes  to  trials  approaches, 
as  a  limit,  the  ratio  of  the  number  of  favourable  ways  to  total 
ways.  It  is  sometimes  assumed  that  the  one  or  the  other  of 
these  assumptions  can  be  proved  by  Bernoulli's  theorem,  to  be 
developed  in  a  later  chapter.  This  is  pure  illusion.  No  one's 
theorem,  based  on  a  'priori  considerations,  can  prove  that 
in  practice  a  coin  will  show  heads  about  one-half  the  time. 
Moreover,  a  few  moments'  reflection  will  show  that  in  one 
guise  or  another  we  must  have  both  of  these  assumptions. 
Without  the  second,  we  could  never  predict  the  probability  of 
an  outcome  from  the  data ;  the  matter  would  always  have 
to  be  put  to  the  test.  Without  the  first  assumption,  there 
would  be  absolutely  no  connexion  between  the  ratio  of  favour- 
able to  total  ways  and  the  statistical  ratio  determined  by 
practice  ;  a  probability  defined  by  the  former  would  be  an 
abstract  number,  having  no  practical  significance.* 

There  is  one  other  very  important  point  in  this  second 
assumption,  which  we  mentioned  above,  and  which  we  must 
now  examine  carefully.  What  is  meant  by  '  equally  likely '  1 
If  we  say  that  two  ways  are  equally  likely  when  the  number 
of  arrivals  either  way  bears  to  the  number  of  trials  a  ratio 
with  the  same  limit,  we  are  running  around  in  a  circle,  and 
saying  that  if  the  limit  of  a  certain  ratio  is  the  probability  of 
success,  why,  then  the  probability  of  success  is  the  limit  of 
that  ratio.  No,  if  our  second  assumption  is  to  tell  us  any- 
thing at  all,  we  must  mean  something  else  by  *  equally 
likely  '. 

There  has  been  a  good  deal  of  debate  among  philosophers  as 
to  just  what  meaning  should  be  attached  to  these  mystic  words, 
and  two  sharply  divergent  views  have  been  expressed,  and 
ably  defended.  The  first  of  these,  which  has  the  great 
Authority  of    Laplace,t    and   has  been   vigorously   defended 

*  This  is  admirably  brought  out  by  Cournot,  Theorie  des  chances,  Paris, 
1843,  pp.  437  ff. 

■)■  See  his  Traite  analytique  desprobabilites,  Paris,  1812, 
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by  Stumpf,"*  is  expressed  by  saying  that  two  results  are 
equally  likely  when  we  know  that  one  of  them  must  happen, 
but  have  no  information  leading  us  to  expect  the  one  rather 
than  the  other.  Everybody  will  admit  that  this  expresses 
a  necessary  condition  that  two  events  should  be  equally  likely, 
the  doubt  is  as  to  its  sufficiency.  Assuming  that  Mars  is 
inhabited,  what  is  the  probability  that  the  inhabitants  are 
carnivorous  ?  The  most  imaginative  observer  will  acknow- 
ledge that,  as  far  as  our  present  information  goes,  we  are  com- 
pletely in  the  dark  on  this  interesting  point,  there  is  nothing  to 
guide  our  opinion.  Shall  we,  therefore,  say  that  the  prob- 
ability that  these  enterprising  engineers  are  carnivorous  is 
expressed  by  the  fraction  ^  ?  And  if  we  say  so,  shall  we  go  on 
to  the  assertion  that  if  future  astronomic  research  revealed  to 
us  that  a  large  number  of  the  heavenly  bodies  were  inhabited 
we  might  expect  to  find  carnivorous  inhabitants  in  about  one- 
half  of  them  ?  Such  an  assumption  is  the  merest  juggling 
with  words,  and  we  do  not  hesitate  to  pronounce  against  the 
sufficiency  of  this  condition. f  It  is  not  unnatural,  then,  that 
some  philosophers  have  been  led  to  the  opposite  extreme,  and 
have  maintained  that  we  can  only  say  that  two  events  are 
equally  likely  if  we  are  acquainted  with  all  the  causes  tending 
to  produce  the  one  or  the  other,  and  know  them  to  be  of  equal 
potency.  We  do  not  say  that  a  spinning  coin  is  equally  likely 
to  turn  up  head  or  tail  because  we  know  no  reason  to  expect 
the  one  rather  than  the  other ;  we  make  this  affirmation  only 
upon  the  hypothesis  that  it  is  a  real  coin  and  not  a  counterfeit, 
nearly  homogeneous,  with  the  centre  of  gravity  near  the 
middle,  while  the  method  of  throwing  is  such  that  it  had  no 
tendency  to  favour  the  one  face  at  the  expense  of  the  other. 
This  idea  was  skilfully  elaborated  by  Von  Kries  J  in  his 
theory  of  '  range  ',  which  is  essentially  Mill's  idea  of  equal 
field  of  variation  for  forces.    Two  ways  in  which  a  thing  can 

*  Uef)er  den  Begriff  der  mathemaliscken  Wahrscheinlichkeit,  Sitzungsberichte,  Royal 
Bavarian  Academy,  Philosophical  Class,  1892. 

t  There  is  a  good  discussion  of  this  point  in  Keynes,  loc.  cit,,  part  I,  ch.  iv. 

X  Pttnzipien  der  Wahrscheinlichkeiten,  Freiburg,  1886.  The  last  chapter  of 
this  work  contains  an  excellent  historical  summary  of  the  various  theories  of 
probability. 
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happen  may  be  said  to  be  equally  likely  when,  and  only  when, 
we  know  that  the  fields  of  variation  or  the  forces  tending 
to  produce  the  one  or  the  other  have  equal  content. 

It  is  certain  that  we  have  here  the  best  possible  way  for 
determining  whether  two  events  are  equally  likely  or  not, 
when  it  can  be  applied ;  unfortunately  in  many  cases  we  have 
no  complete  information,  and  are  tempted  to  fall  back  on 
the  other  principle,  namely,  that  we  have  no  reason  to  believe 
the  range  for  one  set  of  causes  to  be  greater  than  that  for  the 
other.  There  is  a  subsidiary  difficulty,  which  Yon  Kries 
himself  recognized,  and  which  raises  fearful  havoc  in  certain 
parts  of  the  theory  of  probability.  Suppose  that  we  measure 
two  ranges  for  a  certain  variable,  and  find  them  equal.  We 
next  replace  that  variable  by  a  function  of  itself,  and  measure 
the  corresponding  ranges  of  this  new  variable.  They  may  be 
very  far  from  being  equal  to  one  another.  Consequently  two 
eventualities  which  seemed  to  be  equally  likely  when  stated 
in  terms  of  the  first  variable,  might  appear  far  otherwise  in 
terms  of  the  second.  For  instance,  suppose  that  we  know 
that  a  certain  variable  lies  between  10  and  1,000,  the  ranges 
10  to  100  and  100  to  1,000  are  very  different  in  magnitude, 
and  would  not  seem  to  produce  equally  likely  cases  for  any 
event  dependent  on  them.  But  if  we  found  that  the  natural 
measurement  to  make  was  not  the  variable  itself,  but  its 
logarithm ;  if  the  variable  appeared  naturally  as  the  anti- 
logarithm  of  a  certain  number,  then  the  ranges  of  1  to  2  and 
2  to  3  for  the  logarithm  would  seem  to  produce  equally  likely 
cases. 

In  spite  of  this  difficulty,  our  own  preference  is  strongly 
towards  the  latter  form  of  definition.  Not  the  least  of  its 
merits  is  that  it  is  an  objective,  not  subjective  shape,  and  so 
harmonizes  with  our  general  point  of  view.  We  shall  say, 
then,  that  the  words  '  equally  likely '  cannot  be  used  unless  the 
essential  conditions  governinof  the  result  are  known,  using  the 
word  essential  in  the  same  sense  as  in  Assumption  1.  It  is 
conceivable  that  some  of  these  essential  conditions  might  tend 
to  favour  one  outcome,  some  another.  If  nothing  be  known 
about  the  relative  strength   of  these  diverse  tendencies,  we 
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cannot  go  further.     But  if  we  can  say  that  the  total  resultant 
of  the  essential  conditions  does  not  tend  to  favour  one  outcome 
rather  than  the  other,  then  the  two  may  be  said  to  be  equa  l^ 
likely. 

Our  second  empirical  assumption  enables  ua  to  predict 
probabilities  in  cases  where  the  number  of  ways  in  which  an 
event  can  happen  is  finite.  We  need  some  corresponding 
assumption  in  the  case  where  there  are  an  infinite  number 
of  possibilities.  What  we  shall  need  here  is  some  assumption 
as  to  the  probability  that  a  group  of  variables  should  take 
values  within  a  small  neighbourhood  of  a  given  group.  We 
have  lurking  in  the  background  the  same  difficulty  we  saw 
above  in  finding  two  ranges  leading  to  equally  likely  results ; 
at  best  we  cannot  make  any  very  clear-cut  hypothesis. 

Third  empirical  assumption. 

If  an   event   depend    upon  n  independent  variables 

X^  A'2  ...  X^,  WHICH  CAN  VARY  CONTINUOUSLY  IN  AN  Tl-DIMEN- 
SIONAL  CONTINUOUS  MANIFOLD,  THERE  EXISTS  SUCH  AN  ANALYTIC 
FUNCTION  F{X^X^  ...  A" J  THAT  THE  PROBABILITY  FOR  A 
RESULT  CORRESPONDING  TO  A  GROUP  OP  VALUES  IN  THE  IN- 
FINITESIMAL REGION 

A 1  d:  4  dX^ ,  A  2  +  4  cZ A.^ , . . .  A,j  +  |  cLY„ 

DIFFERS  BY  AN  INFINITESIMAL  OF  HIGHER  ORDER  FROM 

F (Aj ,  X.^y...  X^) tZXj dX^ . . .  dXj^. 
It  must  be  acknowledojed  that  as  lono^  as  we  are  in  total 
ignorance  as  to  what  function  F  may  bo,  this  assumption 
does  not  lead  us  very  far.  The  requirement  that  it  should 
be  analytic  is  unnecessarily  strong,  but  we  need  a  continuous 
function,  and  we  can  approach  as  near  to  such  a  function 
as  we  please  b}'  an  analytic  function.  Moreover,  the  assump- 
tion is  not  quite  so  fruitless  as  one  might  fear.  Let  p  be 
the  probability  that  the  event  take  a  form  which  we  shall 
call  favourable,  and  let  this  correspond  to  a  region  of 
variation  K  then 


P 


F{X,,X„...XJdX,dX,...dX,, 

R 
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If  the  total  field  of  variation  be  Ty  we  have 


1  = 


w  T 


F  (  Jl  J ,  -A.  2    •  •  •  -^n)  ^-^1  ^-^  2  •  •  •  ^"^  "  • 


Now  let  ajjiT.^...  a;„  be  such  a  set  of  independent  variables, 
functions  of  the  old  ones,  that  regions  R  and  T  correspond  to 
regions  r  and  t^  and  that 

V  [X-,,  0\.  ...  Xji )  1? /  V       V  V    \ 

^,y      Y v^^  =  -f'  V-^1.^2'•••^n)• 
Then  the  probability  for  a  set  of  variables  lying  in  the 
favourable  reofion  is 


dx^  dx.j . . .  dxn 


P  = 


(1) 


CLX-i  Cl/Xi)  ...  ClXji 


"t 


and  this  is  the  ratio  of  the  content  of  the  desired  manifold  r 
to  the  total  manifold  t,  when  measured  in  terms  of  the 
variables  x^X2...XJ^.  Or,  to  put  the  matter  otherwise,  the 
probability  of  a  favourable  outcome  is  the  ratio  of  the  content 
of  the  favourable  range  to  that  of  the  total  range,  when  the 
right  variables  are  chosen.  In  many  cases,  the  mere  state- 
ment of  the  problem  leads  naturally  to  the  right  variables  ; 
it  is  only  when  there  is  considerable  doubt  as  to  which 
variables  these  are  that  the  problem  is  obscure.  And  of 
course  the  correctness  of  any  answer  depends  upon  the 
correctness  of  the  choice  of  variables.  We  shall  explain 
these  points  in  greater  detail  in  a  subsequent  chapter,  that 
dealing  with  geometrical  probability. 

There  is  one  other  possible  method  of  defining  probability 
which  should  receive  a  passing  notice.  In  modern  discussions 
of  the  foundations  of  mathemati^cs,  we  do  not  define  points  or 
numbers,  except  in  the  sense  that  we  make  certain  independent 
postulates  about  them.  In  the  same  way,  we  might  say  that 
the  probability  of  an  event  was  a  number  which  was  a  function 
of  that  event,  and  which  obeyed  certain  formal  laws  of  logic. 
This  method,  which  we  should  like  to  see  developed,  would 
bo  unexceptionable  from  the  point  of  view  of  abstract  mathe- 
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matics,  but  the  real  importance  of  the  calculus  of  probability 
does  not  lie  in  any  such  field  as  that.  We  emphasize  once 
more  that  we  are  dealing  with  what  is  somewhat  loosely 
called  an  *  applied  science ',  and  the  fundamental  questions 
do  not  deal  with  the  abstract  philosophical  nature  of  prob- 
ability, which  always  seems  to  remain  somewhat  obscure  and 
elusive,  but  rather  the  meanings  of  numerical  probabilities  in 
specific  cases.  The  purpose  of  this  first  chapter  has  been  to 
develop  a  general  definition  for  that  meaning  which  should 
*  make  sense '  in  every  case. 


CHAPTER    II 
ELEMENTARY  PRINCIPLES  OF  PROBABILITY 

§  1.    Formulae  for  Combinations  and  Arrangements. 

If  71  be  a  positive  integer,  we  give  the  na.me  factorial  n  to 
the  product  n- (n—l)-  (n—2)  --  •  S  •  2  - 1,  and  we  have  for 
this  a  special  notation,  namely 

n  (n-  1)  {n-  2)  •  •  •  3  •  2  •  1  =  ti  !  (1) 

It  is  convenient  to  extend  this  equation  to  the  case  where 
n  =  0  ^i 

^         '         n 

The  student  must  not  forget  that  this  is  a  definition,  it  is 
not  a  statement  that  if  0  be  multiplied  by  the  positive  integers 
less  than  itself,  the  product  is  1. 

Suppose  that  from  n  distinguishable  objects  we  pick  r 
objects,  and  arrange  them  in  order;  in  how  many  ways  can 
this  be  done?  Evidently  we  have  a  choice  of  n  objects  for 
the  first  place,  n—1  for  the  second,  &c.  The  number  of 
arrangements  is     n  (n—l)  . . .  (n  —  i^ -\- 1). 

This  is  sometimes  called  the  number  of  pei'mutations  of  n 
things  taken  r  at  a  time,  and  written  i^^,  but  we  do  not  need 
to  burden  ourselves  either  with  the  name  or  the  symbol.     - 

A  more  interesting  and  important  number  is  that  which 
tells  us  in  how  many  ways  r  objects  can  be  picked  from  n 
objects,  regardless  of  order.  If  this  number  be  x,  and  if  we 
subsequently  multiply  by  the  number  of  ways  that  r  objects 
can  be  arranged  among  themselves,  the  product  is  the  number 
of  arrangements  of  r  objects  taken  from  n  objects,  thus 


14  ELEMENTARY    PRINCIPLES    OF  PROBABILITY 

x-rl  =  n  (n  —  })'•'  (n  —  r -^  1) , 

7i(n—l)"'(n  —  r+l) 

x=z , 

7'! 

X  =  n  \/t  \{n~r)\  (2) 

It  is  to  be  noted  here  that  r  and  {n  —  r)  appear  sym- 
iTietrically,  but  that  might  have  been  foreseen,  ibr  the  number 
of  ways  that  we  can  pick  r  things  to  be  taken  from  n  things 
is  the  number  of  ways  that  we  can  pick  {n  —  r)  to  be  left. 
The  total  number  of  ways  in  which  something  can  be  taken  is 


^^^  rlin—r)l 

and  this  may  be  written,  by  the  aid  of  the  binomial  theorem, 

(1  +  1)^-1  =  2"-l. 

This  again  might  easily  have  been  foreseen,  for  each  in- 
dividual object  may  be  taken  or  left,  irrespective  of  the  others, 
but  we  must  exclude  the  one  case  where  all  are  left. 

Let  us  return  to  formula  (2).  An  easy  and  important 
extension  is  found  as  follows.  In  how  many  ways  can  n 
objects  be  divided  into  a  group  of  a  objects,  another  of  b 
objects,  another  of  c  objects,  and  so  on?  The  first  group  can 
be  chosen  in  n\/al{n  —  a)l  ways.  The  second  group  can  be 
taken  from  the  remainder  in  (n  —  a)\/b  l(n  —  a  —  b)l  ways,  and 
so  on.     Multipl}  ing  together,  we  get 

r^ (3) 

alblcl...  ^  ^ 

There  is  one  modification  of  this  formula  which  is  easily 
overlooked.  Let  n  =  rs.  and  let  us  imagrine  that  we  have 
r  groups,  each  of  s  objects.  The  formula  above  will  give  as 
the  number  of  ways  n !/(«  !)^,  and  this  answer  is  usually  right. 
But  in  certain  cases  we  may  wish  to  make  no  distinction 
between  the  first  group,  the  second  group,  &c.,  so  that  to  get 
the  answer,  we  should  divide  this  by  the  number  of  ways  in 
which  the  r  groups  might  be  arranged  in  an  order  of  prefer- 
ence, thus  getting  n  !/(s  !)^  r ! . 

As  an  example,  we  see  that  the  number  of  ways  that  four 
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hands  can  be  dealt  in  such  a  game  as  whist  or  bridge  is 
62!/(13!)'^,  for  the  situation  of  a  hand  with  regard  to  the 
dealer  is  important.  But  if  we  ask  in  how  many  ways  can 
52  cards  be  divided  into  four  indistinguishable  piles,  the 
correct  answer  is  52  !/( 1 3  !)*  4  ! . 

This  number  is  a  good  deal  smaller  than  the  other,  but  is 
by  no  means  a  small  number  for  all  that. 

It  is  time  to  illustrate  these  principles  with  some  examples. 

Example   l]    In  a  certain  company  there  are  15  men  and 
10  women  ;   in  hoiu  many  ways  can  a  comviittee  be 
inched  including  3  men  and  2  ivonien  ? 
The  answer  is  clearly  the  product  of  the  numbers  of  ways 

in  which    the   representatives   from    the   two   sexes   can    be 

chosen,  namely 

16!  10!         15x14x13        10x9       „^  ,^, 

=  20,475. 


12!3!        8!2!  1x2x3  2x1 

Example  2]    3  travellers  arrive  at  a  toiun  luhere  there  are 

5  inns ;   in  how   many  different  ways  can  they   be 

lodged  ? 

The  natural  way  is  to  treat  each  traveller  as  an  independent 

unit,  capable  of  making  5  choices,  thus  getting  5^  =  125.     But 

if  we  know  further  that  the  travellers  have  quarrelled  on 

the  road,  so  that  no  two  will  lodge  at  the  same  inn,  the  choice 

is  reduced  to     5  x  4  x  3  =  60. 

Example  3]  In  how  many  ways  can  all  the  letters  of  the  ivord 
MississiiJjpi  be  arranged  ?  )  i ' 
If  all  of  the  letters  were  distinguishable,  the  number  would 
be,  clearly,  11 !,  but  we  must  divide  this  by  the  number  of 
ways  in  which  the  indistinguishable  i's  can  be  arranged,  the 
indistinguishable  s's,  and  the  p's,  getting 

11  !/(4!)2  2!  =  34,650. 
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§  2.    Simple  Problems  in  Total  and  Compound 

Probability. 

Example  4]     6  cards  are  chosen  at  random  from  a  pack  of 

52 ;   tuhat  is  the  'prohahility  that  3  will  he  black  and 

3  red  ? 

The  words  '  at  random '  here  signify  that  we  consider  all 

combinations  of  6  equally  likely  in  the  sense  explained  in 

the  last  chapter.     We  have  thus,  by  our  first  two  empirical 

axioms,  merely  to  find  the  ratio  of  favourable  ways  to  total 

ways,  namely 

L(3!)(23!;J         6!  46! 

Example  5]     A   card   is  chosen  at   random  from  each   of 
6  packs ;  what  is  the  probability  that  3  cards  will  be 
black  and  3  red. 
In  this  case  the  total  number  of  ways  is  -62^     To  find  the 
favourable  ways  we  divide  the  6  packs  into  3  which  are  to 
show  red,  and  3  to  show  black,  and  multiply  by   26^;   the 
answer  is  q\      26^       5x4_^o,o 

(3!)2    526  2« 

It  would  not  have  been  easy  to  say  off-hand  which  of  these 
problems  would  have  the  larger  answer. 

There  are  two  general  principles  which  are  of  fundamental 
importance  in  doing  simple  problems  of  the  present  sort; 
these  must  now  be  explained.  Suppose  that  there  are  two 
events  which  are  mutually  exclusive,  if  the  one  happen  the 
other  cannot ;  and  suppose  that  their  respective  probabilities 
are  j9i  and^g*     ^®^  there  be  a  large  number  N  of  trials,  and 


Problems. 

1.  In  how  many  ways  can  a  boat's  crew  of  8  be  chosen  from  20  men  ? 

2.  In  how  many  different  ways  can  two  dice  appear  ?     How  many  times 
will  each  possible  sum  appear? 

3.  Prove  the  *  multinomial  theorem  ',  namely, 

a  +  /3+...+A  =n. 
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let  the  first  event  happen  M^  times,  while  the  second  happens 
il/^  times.     Then  by  the  fundamental  definition  of  probability 

Lim  ^  _  ,,      Lim  ^^2  _  ,, 

Now,  one  of  the  basic  theorems  of  tlie  infinitesimal  calculus 
tells  us  that  the  limit  of  the  sum  of  two  variables  dependent 
upon  the  same  third  variable  is  the  sum  of  their  limits,  so  that 

Lim  [^[i±^  _  Lim  [^h  ,   ^^1  _  .,    ... 

But  tbe  limit  on  the  left  is  the  probability  that  the  one 
event  or  the  other  shall  happen.  We  may  apply  this  same 
principle  to  any  number  of  mutually  exclusive  events,  the 
probability  that  one  of  n  mutually  exclusive  events  shall 
happen  is  the  sum  of  the  probabilities  that  a  specified  event 
shall  happen  plus  the  probability  that  some  one  of  the  other 
11—1  shall  happen.  Proceeding  thus  by  a  downward  mathe- 
matical induction,  we  reach  the 

Theorem  of  total  probability,  special  case. 

The  ■lyrohahirUy  that  one  of  iinnj  nainher  of  inutiudly  exclusive 
events  should  happen  Is  the  suiii  of  the  prohahilUies  for 
the  separate  events. 
(When  wo  have  a  constantly  increasing  number  of  prob- 
abilities, each  individual  one  decreasing  indefinitely,  if  their 
fcum   approach   a  detinito  limit  as  the  number  increases  in- 
definitely, that  limit  will  be  the  probability  in  the  limiting 
case.) 

Example  6]  2  dice  arc  thrown;  wltat  is  the  prohah'dlty  that 
the  sum  i<hown  'Uj'dL  he  7  or  11  '!■ 

The  sum  7  can  be  shown  in  bix  different  w^ays,  the  sum 
11  in  only  two,  hence     /e  +  A  =  |  =  0-222. 

Here  is  the  second  principle.  Suppose  that  we  have  a  com- 
pound event  which  is  the  result  of  the  combination  of  two 
other  events.  We  shall  assume  that  these  two  are  mutually 
independent.     What  does    that   phrase    mean  ?     If  we   take 

2686  C 
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a  mechanistic  view  of  the  universe,  no  one  event  is  ever 
independent  of  any  other,  the  outcome  of  any  event  must 
have  a  definite  effect  on  all  future  history.  Nevertheless,  the 
uncorrupted  man  of  common  sense  has  a  perfectly  definite 
idea  of  what  he  means  by  saying  that  two  events  are  mutually 
independent.  His  meaning  may  be  expressed  by  the  follow- 
ing: 

Definition]  Tiuo  events  are  said  to  be  mutually  independent 
when  the  'probability  for  either  is  the  same  luhether  the 
other  happen  or  not. 

We  take  it  as  an  empirical  fact  that  there  are  such  events 
in  the  universe,  and  that  we  can  tell  them  when  we  see  them. 
Suppose,  then,  that  we  have  two  mutually  independent 
events,  the  first  with  the  probability  p^,  the  second  with  the 
probability  p^^  what  is  the  probability  p^^  for  the  arrival  of 
the  compound  event  which  consists  in  the  arrival  of  both? 
Let  there  be  a  large  number  N  of  trials.  Let  the  first  one 
happen  J/j  times,  the  second  happen  M^  times,  while  both 
happen  il/jg  times. 

^    _Lim^2_Lim^.^2. 
^.  _  Lim  ^       Lim  ^±1:  .    ^   _  Lim  -^1 . 

Now  the  limit  of  the  product  of  two  variables  is  the 
product  of  their  limits,  hence    py^^  =  I'^i  'Vi- 

Theorem  of  compound  probability. 

If  a  compound  event  consist  in  the  conjunction  of  any  number 

of  independent  events,  the  p>robability  of  the  compound 

event  is  the  i^rodiict  of  the  probabilities  for  the  individual 

events. 

Strictly  speaking,  wo  have  only  proved  this  in  the  case  of 

two  independent  events,  but  the  reader  will  find  that  the 

previous  proof  by  mathematical  induction  will  apply  absolutely 

in  this  case  also. 


fd\  -r  / 


\ 
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Example  7]  A  die  is  thrown  12  times  ;  ivhat  is  the  probability 
that  the  face  4  will  appear  just  tivice? 

There  are  various  ways  of  showing  2  fours  and  10  not 
fours,  all  mutually  exclusive  and  equally  likely.  Hence  the 
answer  is  the  probability  of  starting  with  2  fours,  and  then 
running  10  not  fours,  multiplied  by  the  number  of  ways  that 
two  objects  can  be  chosen  from  12,  thus 

1       1       /SV^       12x11       S'^xll 

-  X    -  X  (  -  )     X  — =  — -n —  =  0-296. 

6       G       \6/  1x2  6^1 

Example  8]  A  throw's  3  coins ^  B  throius  2 ;  ivhat  is  the 
chance  that  A  ivill  throw  a  greater  number  of  heads 
than  B? 

Note  the  wording  of  the  problem ;  A  is  not  to  throw  as 
many  or  more  heads,  but  actually  a  greater  number.  This 
can  be  done  in  three  mutually  exclusive  ways.  We  give  them, 
with  their  chances  : 

A  throws  3  heads  J  x  1  =  J. 

A  throws  2  heads,  B  does  not  |  x  |  =  3%. 

A  throws  1  head,  B  throws  2  tails     |  x  i  =  75 


Total  probability     le  _  1 


8  ^  T  —  T2* 
U2    —    2- 


Example  9]  A  card  is  drawn  at  random  from  a  iMch  aiui 
replaced,  then  a  second  drawing  is  made,  and  so  on. 
Hcnu  many  drawings  must  be  made  in  order  to  have 
a  chance  of  ^  that  the  ace  of  spades  shall  appear  at 
least  once? 

It  is  assumed  that  the  cards  are  properly  shuffled  after  each 
drawing.  The  different  drawings  are,  thus,  independent 
events,  with  the  same  probabilities  each  time.  The  chance 
that  the  ace  of  spades  will  never  appear  in  n  drawings  is  (f|-)^*. 

We  desire  the  contrary  of  this,  namely, 

1       /5l\n  —  1, 

■^  ~  \5"2"/       —    2> 

(Si^n  _    1 

\'S2'      —    21 

loo:  2 

n  =  , — -J^-j =36-. 

log  52 -log  61 

C  2 
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Example  10]  In  koxv  many  throws  loith  a  dngle  die  is  there 
an  even  chance  that  the  number  6  ^vill  appear  at  least 
once? 

/SvW    _    1 

U'     —  2» 

log  2 
n  =  . ^^ =  4  —  . 

loor  6  — loor  5 

Exauiple  11]  J2  dice  are  throivn ;  in  hoio  many  turns  is 
there  an  even  chance  that  double  sixes  luill  appear  at 
least  once? 

/.?S>J/    _    1 
U^'       —    2' 

log  2 

'^^  =  1  — -o—  -1 17^  =  25  —  . 

1o<j:36— lo<?35 

These  examples  are  of  not  a  little  historical  interest.  Two 
(lice  can  appear  in  six  times  as  many  ways  as  one  die ;  with 
one  die  there  is  more  tlian  an  even  chance  to  see  the  six  in 
four  throws,  while  with  two  dice  there  is  less  than  an  even 
chance  to  see  double  sixes  in  six  times  four,  or  twenty-four 
throws.  This  simple  fact  is  known  as  the  *  paradox  of 
Chevalier  de  M^rd ',  about  which  Pascal  wrote  to  Fermat :  * 

'  Voila  quel  dtoit  son  grand  scandale,  qui  lui  faisoit  dire 
hautement  que  les  propositions  n'dtoient  pas  constantes,  ct 
que  I'arithmetique  se  dementoit.* 

Example  12]  Three  playera  A,  B,  and  C  play  under  the 
foUowin(j  conditions.  In  each,  tarn  the  chance  for 
success  if^  the  same  for  each  of  itvo  contestants.  A  and 
B  play  together  the  first  turn,  the  winner  2^l(^'ys  ivlth  C, 
and  if  he  won  again  he  wins  the  game  ;  if  not  C  plays 
wltlt  the  third  mCin  and  so  on  until  one  man  hat>  icon 
two  tarns  in  succession ;  what  is  the  chance  for  each 
player? 

Let  us  begin  l»y  showing  that  there  is  a  zero  chance  that 
the  <Tfame  will  go  on  for  ever.  The  only  way  that  this  could 
liappen  would  be  for  the  winner  of  each  turn  to  be  other  than 
the  man  wlio  won  the  turn  before,  and  the  chance  for  that  is 

ixix...  =0. 

*  Tabcal,  a''(iTcs,  edition  of  ISIO,  vol.  iv,'p.  367. 
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A  and  B  have  equal  chances.  We  first  find  C's  chance, 
then  one-half  the  difference  between  that  and  1  is  the  chance 
of  A  ov  B.  C  might  win  his  first  two  turns.  Or  he  might 
win  his  first,  lose  his  second,  and  win  his  third  and  fourth, 
after  the  man  who  beat  him  at  his  first  turn  has  been  defeated 
by  the  other  man,  and  so  on.     His  chance  is  thus 

1  11  11  1  r,       1       1         1  -| 


Chance  for  J.  or  ^  is  j^. 


_4_  —  2 

14    —    7" 


Tchebycheff's  Example]  What  is  ihe  iwobahility  that  tnv 
integers  chosen  at  random  shall  he  relatively  'prime'?  * 
The  chance  that  the  first  integer  shall  be  divisible  by 
a  prime  ?',  is  the  chance  that  its  remainder,  when  divided  by  r, 
should  be  equal  to  zero.  Assuming,  then,  that  all  remainders 
are  equally  likely,  the  chance  is  1/r.  Hence  the  chance  that 
r  is  not  a  common  factor  of  the  two  is  1  —  1/?'^. 

*  Cf.  Markoff,  loc.  cit.,  p.  148. 


Problems, 

1.  Let  n  dice  be  thrown.  In  how  many  throws  is  there  an  even  chance 
that  all  will  appear  sixes  at  least  once.  Show  tliat  this  number  is  not 
proportional  to  n  as  Chevalier  de  Mere  supposed. 

2.  A  popular,  if  unaristocratic  game  called  '  craps  '  is  played  as  follows. 
Two  dice  are  thrown,  and  one  of  the  players  will  win  if  (a)  the  sum  be  7  or 
11,  (6)  if  the  sum  be  4,  5,  6,  8,  9,  or  10,  and  the  same  sum  reappears  before 
7  is  ever  seen.     What  is  the  chance  that  this  player  will  win  ? 

3.  In  1921  Lieutenant  R.  S.  Hoar,  U.S.A.,  drew  five  cards  from  a  pack 
1,000  times,  with  the  following  results.  Two  were  of  the  .same  denomina- 
tion with  three  scattering  412  times,  three  were  of  the  same  denomination 
and  two  scattering  23  times,  two  were  of  one  denomination,  two  of  another, 
and  the  fifth  of  a  third  5  times,  three  of  one  denomination  and  (wo  of 
another  1  time.  Compare  these  figures  with  the  numbers  to  bo  expected 
by  calculation. 
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Our  required  probability  is,  then, 


1 

1          1 

1 

V 

-r.  -r. 

-i 

=  Limit  (for  all  primes) 

[Lim(l  +  i  +  i...)xLim(l  +  l+  lf+...)x...]. 

These  series  are  absolutely  convergent,  so  that  we  are 
allowed  to  rearrange  the  order  of  the  terms  and  change  the 
order  of  the  limits.* 


V 

n 
Now 


1  T-         /,  1  1  1  \ 


^^\o^xdx  _  ^^ 
0     ^-x    ~ 


Butt 


as  an  integration  by  parts  shows,  since  Lim  cc"^^ logic  =  0. 

loga;(l+a^  +  a;2^...)cZa;=-(l  +  -+  ^,  +  ...)- 

^^  locrxdx  _        TT^ 
^,       l-X      ~~"6   ' 

2?  =  G/tt'^  =  0-607. 

In  the  special  case  of  the  theorem  of  total  probability,  we 
calculated  the  chance  that  one  of  a  number  of  mutually 
exclusive  events  might  occur.  Since  the  events  are  mutually 
exclusive,  it  is  the  same  thing  to  calculate  the  probability 
that  one  should  happen,  or  that  at  least  one  should  happen. 
When,  however,  they  are  not  mutually  exclusive,  the  two 
probabilities  are  quite  different.  It  is  now  time  to  take  up 
this  '  at  least  one '  question  in  the  general  case. 

*  Cf.  Tannery,  Theorie  des  fonctions  d^une  variable,  2nd  ed.,  Paris,  1904, 
vol.  i,  pp.  152  ff. 

t  Cf.  B.  0.  Peirce,  Short  Table  of  Integrals,  2nd  revised  ed.,  Boston,  1910, 
p.  64. 
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We  begin  with  two  events.  Let  their  probabilities  be  ^^ 
and  1^2^  while  the  probability  that  both  will  happen  is  2^\^' 
The  probability  that  at  least  one  will  occur  is  the  probability 
of  the  arrival  of  one  of  three  mutually  exclusive  events, 
namely,  both  happen,  the  first  happens  and  the  second  fails, 
the  first  fails  and  the  second  happens.  Moreover,  the  prob- 
ability that  the  first  happens  is  the  sum  of  the  probabilities 
that  both  happen  and  the  probability  that  the  first  happens 
and  the  second  fails ;  this  latter  probability  will,  then,  have 
the  value    Pi—Pi2' 

The  probability  that  we  seek  will  thus  be 

(Pi-Pu)  +  iP2-Pi2)  +P12  =  P1  +  P2-P12' 
Let  us  now  assume  that  when  ^^--l  events  are  concerned, 
the  probability  that  at  least  one  happens  is 

i  =  n-l  ij  =  n-l  i,j,k  =  n~l 

where  in  any  one  term  i,  j^  k,  &c.  take  on  distinct  values. 

By  the  same  process  of  reasoning,  when  an  nth  event  is 
introduced,  the  probability  that  this,  and  at  least  one  other 
will  occur  is 

t=n-l  i,3  =  n-\  ij,k  =  n-l 

■Mn-i)  n  ~      ^    Pin~  2       ^     Pijn'^W'^        2J       PijJcn'~"" 
«  =  1  t',j=l  '    i,j,k=l 

The  probability  that  at  least  one  of  n  events  will  occur 
is,  by  the  first  case, 

•^  —  Pn  "^  -^-1 ""  -Mn-l)  n 

*  =  *»  ^  hj  =  n  i,  j,  k  =  n 

=   2?5»-  2    2    /'v  +3J      2     Pijk-""  (4) 

t  =  1  i,j=l  '  i,  j,k  =  l 


Problem. 

What  form  does  formula  (4)  take :  (a)  wlien  the  events  are  mutually- 
exclusive,  (6)  when  they  are  independent?  Prove  your  answer  in  each 
case. 
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Theorem  of  total  probability,  general  case. 

If  n  different  events  Jje  under  consideration^  and  if  the 
prohahility  for  the  shiidtaneous  occurrence  of  the  ith^ 
jtJt,  lih,  cSJr.  event  he  ^>-;.  y^,  then  the  prohahllity  for 
the  occur renre  vf  at  lead  one  of  these  events  is  given  hy 
formula  (4). 
There  are  not  a  great  many  interesting  applications  of  this 

beautiful  geneial  fornuihi,  hirgely  owing  to  the  difiiculty  of 

calculating  the  ditierent  ^/s.     We  shall,  however,  give  two. 

The  iirst  is  an  example  worked  out  l^y  De  Montmort  nearly 

two  hundred  years  ago.* 

])c  Montmort's  Example]  //'  n  Indls  in  an  urn  he  numbered 
1,  2,  3,  .,:n  resi/ectiveli/,  and  if  they  he  drawn  out  at 
random,  one  after  another,  what  is  the  'pruhahility  that 
at  least  one  trill  appear  in  the  tarn  correspondimj  to 
its  mtmher? 

The   probability   that  a  specific  set  of  /;  balls  .shall  come 

.       1       .   ,.       1      •       (>^ -/'•)! 

out  111  (he  riiiht  order  is     ; —  • 

//  ! 

The  probability  that  some  one  set  of  k  will  come  in  order 

is  this   number  multiplied   by  the   number  of  ways   that   A; 

objects  may  be  chosen  from  ii  objects,  namely, 

or!  (>/--/.•)!  _    1 

Ic\{h-L)\  ^       u  !       ~  /.• ! ' 

Our  required  probability  is,  thus, 

1 L  +  JL_..._^(_l)n-il  . 

2  !        3  !  ^        ^        nl 


Tlie  probabillt}'  that  no  ball  will  come  in  the  right  place  is 

1         1         I  ,     .^.   1 

\-  - 

1  !       2  ! 


1  _,..  +  ,.  __+...  +  (-,).._. 


These  are  the  first  terms  of  a  lamiliar  rapidly  converging 
series,  the  difference  between  the  sum  Avritten  above  and  the 
sum  to  infinity  being  less  than  l/(>i  +  l)!,  we  thus  get  the 
curious 

*  Ks-<ai  (VHualy^e  snr  Ir^  jcux  <h:  hrts/c^f/,  2inl  eil.,  Paris,  171S,  p.  132, 
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Theorem]  If  a  ay  large  nuonber  of  halls  be  numbered  1 ,  2, . .  .ti, 
and  if  they  be  draivn  out  one  after  another  from  an 
urn,  the  ]yi^obability  that  no  ball  tulll  ai^pear  in  the 
turn  corresiwnding  to  its  number  is  very  close  to  l/e. 

A  colleague  of  the  author  s  once  stated  the  theorem  in  the 
following  more  picturesque  language : 

*  If  all  of  the  inhabitants  of  Chicago  should  meet  together 
in  one  place  and  get  extremely  drunk,  and  then  try  to  go 
home  by  guess-work,  the  chances  that  at  least  one  would  get 
back  to  his  own  bed  are  almost  two  out  of  three.' 

This  is  one  of  those  cases  where  it  is  fortunate  that  the 
probability  can  be  calculated  beforehand,  and  we  are  not 
forced  to  seek  it  experimentally^ 

The  general  theorem  of  total  probability  enables  us  to  set 
a  limit,  unfortunately  not  a  very  close  one,  to  the  size  of 
a  composite  probability,  when  we  know  the  values  of  the 
individual  probabilities  involved,  but  do  not  know  to  what 
extent  they  depend  on  one  another. 

Since  Pi+P2-Pu^h 

Assume    ;j|2...n-i  ^  P1+P2  + "'+Pn-i-0^-^)^ 

2^12. ..n  ^/^1L'...«-1+At-1, 
Pl2.,.n  ^  Pl+p2+-"+Pu-i'>^-'^)' 

§  3.    Expectation. 

Definition]  If  a  person  have  the  chance  p^  to  receive  the 
positive  or  negative  sum  Sj ,  2^2  ^^  receive  s^,  •  •  •  i^n  ^^  receive 
the  sum  s^,  and  if  these  be  the  only  sums  he  has  a  chance 
to  receive  under  the  circumstances,  then  the  sum 

i  =  n 

is  called  his  e.-cpectatlon  under  the  circumstances. 

Problem. 

If  it  be  found  tliat  91%  of  the  recruits  of  an  army  satisfy  the  first  of 
three  medical  requirements,  8G%  satisfy  the  second,  and  83%  satisfy  the 
third,  what  will  be  a  lower  limit  for  the  proportion  of  those  satisfying  all 
three  ? 
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Tiieorem  l]     The  expectation  is  the  limd  of  the  average  sum 
received  as  the  number  of  trials  increases  indefinitely. 

To  prove  this  let  us  notice  that  if  in  iV  trials,  the  sum  s^  be 
received  T^  times,  6^2,  T^  times,  s„,  T„  times  the  average 
amount  received  is  {Ti8i  +  T^S2+  ...TJ^SJ/N'.  The  limit  of 
this  sum  is  the  sura  of  the  limits  of  the  individual  terms,  and  as 

Lim  ±i  _  ^, 

we  have  our  theorem  proved. 

The  subject  of  expectation  is  used  especially  in  connexion 
with  games  of  chance.  This  branch  of  the  theory  of  probabi- 
lity has  always  had  a  peculiar  fascination  for  a  certain  type 
of  reader,  and  was,  moreover,  the  historic  basis  of  the  whole 
science.  We  shall  therefore  pay  some  attention  to  it  both  in 
the  present  chapter  and  in  subsequent  ones,  even  though  at 
the  present  time  the  calculus  of  probabilities  is  principally 
occupied  with  more  serious  matters. 

Definition]  A  turn  at  a  game  of  chance  is  said  to  be 
fair  to  a  prospective  player,  when  his  expectation  is  0,  it 
shall  be  called  favourable,  when  his  expectation  is  positive, 
otherwise  unfavourable.  In  the  same  way  a  whole  game 
shall  be  called  fair,  favourable,  or  unfavourable,  according  to 
the  expectations. 

Suppose,  for  instance,  that  a  player  stake  a  sum  a,  with 
a  chance  p  of  winning  his  adversary's  stake  b,  while  the 
chance  of  loss,  a  tie  being  excluded,  is  q.     His  expectation  is 

pb  —  qa. 

If  the  turn  be  fair  this  is  0,  and 

p/q  =:  a/b  ;  p/a  =  q/b,  (1) 

Theorem  2]     If  a  turn  be  fair  to  a  player,  it  is  fair  to  his 

adversary,  and  the  probability  of  success  for  each  is 

proportional  to  his  stake.] 

Suppose  that  a  player  plays  two  successive  turns ;  let  the 

probabilities  and   the   stakes   be  i?i5'i«i&i  iii   the  fii*st,  and 

'Pi^.i'^'^1  in  the  second.     Let  us  find  his  total  expectation. 
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P1V2  ih  +  h)  +Pl92  (^l-ttJ  +2h9l  (^2-«l)ei^2  («1  +^2) 

=  (Pi  K  -  <ii^h)  +  (V2K  -  92^2)' 

Evidently  we  might  carry  on  to  any  number  of  turns,  by 
mathematical  induction : 

Theorem  3]  A  2^lciy6'i''8  expectation  from  a  series  of  turns  is 
the  sum  of  his  expectations  from  the  individual  turns. 

Theorem  4]  Any  succession  of  fair  turns^  favourable  turns, 
or  unfavourable  turns,  luill  constitute  a  fair  game, 
a  favourable  game,  or  an  unfavourable  game,  as  the 
case  may  be. 

This  theorem  is  vitally  important,  and  shows  the  utter 
futility  of  a  player's  altering  the  amount  of  his  stakes  in  any 
game,  in  the  hope  of  improving  his  chances.  We  shall  discuss 
this  question  in  greater  detail  in  the  next  chapter. 

Example  13]  A  has  three  pennies,  B  has  two.  The  coins  are 
all  thrown,  and  it  is  agreed  that  the  player  showing 
the  greatest  number  of  heads  shall  win  all;  in  case 
of  a  tie  B  shall  win:  How  is  this  game  from  A's 
point  of  viciu  ? 

A  can  win  (a)  by  throwing  three  heads,  for  which  the 
chance  is  J,  (b)  by  throwing  two  heads  to  one  or  no  heads, 
chance  |  x  |,  or  by  throwing  one  head  against  two  tails, 
chance  f  x  J.     His  total  chance  is  J,  and  his  expectation 

2  X  ^  —  ^X3  =       2» 

The  game  is,  thus,  unfavourable  to  ^,  a  rather  surprising 
result. 

^  Petrograd  pai'adox.  A  spins  a  penny,  and  agrees  to  give 
it  to  B  if  it  come  up  heads.  If  it  do  not  come  up 
heads  till  the  second  time  he  will  give  B  2  pence,  if  not 
till  the  third  time  4,  if  not  till  the  nth  2^"^     How 


Problem. 

How  will  the  game  appear  to  A  if  they  agree  to  begin  again  in  case  of 
a  tie? 
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much  should  B  j^y  for  the  privilege  of  taking  'part  in 
this  2^l^o.sant  garnet 

The  game  will  be  fair  if  B  agree  to  pay  his  expectation  as 
an  entrance  fee,  let  us  therefore  calculate  this  expectation ; 
it  is  clearly 

-.  1  +  -.2  4-      .22+  ...  +  —.2"-^  +  ...=  -fl  -f  1  +  1  +  ...1 

The  absurdity  of  this  answer  constitutes  the  paradox  which 
has  given  rise  to  a  good  deal  of  discussion,  serving  Daniel 
Bernoulli  as  the  basis  of  his  theory  of  moral  value.*.  Theo- 
retically £'s  expectation  is  infinite ;  practically,  as  Bertrand 
remarks,!  any  one  would  be  a  fool  to  risk  100  pence  at  any 
such  game.  j5*s  expectation  is  infinite  provided  the  possibility 
of  an  infinite  number  of  turns  is  admitted,  and  provided,  of 
course,  that  he  has  an  infinite  fortune  to  start  with.  Neither 
of  these  provisos  is  related  to  actual  life.  Let  us  see  how 
many  times  the  coin  will  be  spun  on  an  average  before  heads 
come  up.  This  is  a  problem  in  mean  value  of  the  sort  that 
we  shall  take  up  at  length  in  Ch.  IV,  but  it  will  be  sufficient 
for  our  present  purposes  to  notice  that  this  number  is  the 
expectation  of  a  man  who  shall  receive  a  penny  if  heads 
appear  the  first  time,  two,  if  not  till  the  second,  three,  if  not 
till  the  third,  &c.     His  expectation  is,  then, 

1        1     «       1     o  1 

_  1  /I         2        3        4  n  \ 

~  i\2'^  "^  2^  "^  2^^  "^  2^^  +...+  ^^7riy» 

71  +  2 

=  2 • 

2« 

The  average  number  of  turns  will  not,  therefore,  exceed 
two.     Suppose  that  in  the  lifetime  of  A  and  B  it  will  bo 

*  'Specimen  Theoriae  novae  de  Mensura  Sortis',  Commentarii  Academiae 
Scientiarum  Imperialis,  vol.  v,  p.  175,  Petrograd,  1738.  For  further  references, 
see  Czuber,  Entwicklung,  cit.  pp.  122  ff. 

t  loc.  cit.,  p.  63. 
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possible  to  play  2'^  games.  As  a  further  simplification,  we 
suppose  that  B  wins  one  penny  about  half  the  time,  two- 
pence one-quarter  of  the  time,  fourpence  one-eighth  of  the 
time.  If  he  pay  an  entrance  fee  of  x  each  time,  and  if  this 
be  a  fair  fee  for  2^*  games,  we  have 

2"a;  =  2"-^  1 +2"-- .  2 +  2"-3  ,  22+ ...  +  l  ,2"--'^^  A, 
The  reason  for  the  remainder  term  A  is  that 

2u^  1  _|.2»-i  +  2'^-2+...  +  2  +  l, 

and  we  do  not  know  what  sum  to  ascribe  to  the  odd  turn, 
but  surely  A  <  2'*       2"  x  =  n  2''-'^  +  A 

Now  if  2'^  be  allowed  to  increase  iudelinitcly,  so  will  x, 
but  that  is  not  our  present  hypothesis.  Let  A  and  B  play 
100  turns  per  hour,  working  8  hours  a  day,  300  days  in  the 
year,  for  50  years,  the  number  of  games  would  be  12,000,000, 
which  is  less  than  2^*,  so  that  an  entrance  fee  of  twelve  pence 
would  seem  quite  sufficient. 

^f  Let  us  interpolate  at  this  point  a  problem  of  historical 
interest  which  was  proposed  to  Pascal  by  our  old  friend  the 
Chevalier  de  M^rd.* 

Example  14]  Tico  i^laycrs  ivhose  chances  of  wlnninfj  an 
individual  turn  are  p  and  q  respectively^  a  tie  being 
imi)ossihle,  are  forced  to  break  off  a  game  before  the  end. 
The  first  'player  A  is  within  ni  turns  of  victory,  ivhile 
the  second  player  B  is  within  n  turns  of  victory ^  how 
should  the  stakes  be  divided? 

We  must  calculate  the  chance  of  one  player,  say  A,  Here 
is  ])e  Montmort's  solution. f 

A  may  win  in  various  ways  which  are  mutually  exclusive  : 

(1)  He  may  win  the  next  m  turns,  chance  =  ^;"*. 

(2)  In  the  next  m+  1  turns  he  may  win  the  last,  and  some 
other  m—  1,  chance  =  nipi'^^q. 

*  Pascal,  (Euvres,  cit.,  vol.  iv,  p.  360. 
t  loc.  cit.,  p.  244. 
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(3)  In  the  next  m  +  2  turns  he  may  win  the  last,  and  nir-  1 
of  the  others,  chance  :=m{m+l)2)^^''qy2, 

•     •     • 

Hence  his  chance  of  winning  the  game  is 

^  r                  m(m+  1)   ..  , 
p^l  1  +mg+  —^ 'q'-\-... 

m(m+l)...(m  +  7i-2)         1 

+  (76-1)!  ^       i  ^^ 


^  §  4.     Risk. 

It  sometimes  happens  that  w^e  are  interested  in  knowing, 
not  merely  the  total  expectation,  but  the  negative  part  which 
is  to  be  feared.  For  instance,  a  player  who  should  undertake 
to  play  the  Petrograd  game  because  of  the  infinite  expectation 
would  do  a  foolish  thing,  thanks  to  the  large  negative  part 
of  the  expectation.  More  generally,  a  player  should  not  enter 
a  game,  no  matter  how  brilliant  the  prospects,  if  the  expecta- 
tion of  loss  be  too  large  a  proportion  of  his  fortune. 

Definition]      The  absolute   value  of  that  imrt  of  the   total 

exioectation  which  includes   the   negative   terras ^  and 

these  only,  shall  be  called  the  risk. 

Suppose  that  a  man  has  the  chances  2^1^  P2  '•  Vn  ^^  win  the 

positive  sums  s^,s^,,.Sn'     He  will  pay  for  the  privilege  of 

entering  a  sum  equal  to  his  expectation  e.     Arranging  the 

sums  in  the  decreasing  order  of  magnitude,  his  expectation 

of  gain  is  i  =  m 

2  Pi  {""i  -  ^)  ^i  >  ^> 

while  his  risk  is  -^^^ 

r=     2    2hi^-Si)  e  >Si, 

i    =    )U+l: 

Suppose  that  he  insures  himself  against  loss  by  paying  this 
sum  to  a  speculative  company  which  agrees  to  pay  whatever 
loss  he  may  sustain  in  the  game. 
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His  present  expectation  of  gain  is 


2  Pii^i-^-'^)^ 
1  =  1 


his  present  risk  is 


^Pi{r  +  e-Si)+     2    Pi'^=     2    2^i(e-Si)+   ^  PiV, 

m'+l  t  =  m  +  l  t  =  m'+l  m'+l 

i  =  n  i  =  m 

Now  r  =  2  Pt^;        2    Pii^  —  ^i)  <  ^' 

t  =  1  t  =  n»'+l 

Hence,  for  both  reasons,  the  risk  is  less  than  r  the  previous 
risk,  but  the  expectation  of  gain  is  reduced  by  a  corresponding 
figure. 

If  a  man  have  the  chances  g^  q<^...qn  to  lose  the  sums 
«ij  ^2^  h'"  ^n'  ^^is  risk  will  be  giSi  +  g2^2+  •••  5'n®n»  which  will 
be  a  minimum  if  all  the  money  be  placed  on  the  safest  chance, 
but  the  chance  of  total  loss  will,  of  course,  be  much  greater. 

Let  us  conclude  by  returning,  for  a  moment,  to  the  question 
of  fair  and  unfair  turns.  It  sometimes  occurs  to  a  player 
that  he  will  be  sure  to  win  a  game  if  he  make  the  resolution, 
and  stick  to  it,  which  may  be  difficult,  of  stopping  play  as 
soon  as  he  lias  lost  a  turn.  Assuming  that  his  stake  is  a,  his 
adversary's  6,  the  respective  chances  (tie  excluded)  ^  and  g, 
and  that  the  game  is  fair,  we  may  change  the  unit  of  coinage 
so  that  his  stake  is  ^j>,  and  his  adversary's  q.  His  expectation 
is  then     —pq-\-pq(q—p)-¥p-q{'^q  —  'p)+  .... 

The  first  term,  and  perhaps  some  of  the  subsequent  ones,  is 
negative,  but  we  soon  find  positive  terms,  so  that  the  risk 
is  small.     Evaluating  this  we  get  • 

-pq[^  +  {p-q)-^p{p-'^q)  +  '"'] 
=  -pq[l  +  {p-{l-p})+p(p-'2.{l-q])  +  ...'] 
=  0. 

The  resolution  to  stop  after  the  first  loss  is  unwise,  except 
for  a  very  poor  man.  It  will  discontent  the  adversary,  because 
it  seems  unsportsmanlike,  but  will  not  change  a  zero  expecta- 
tion into  a  positive  one. 


CHAPTER  III 

BERNOULLI'S  THEOREM 

§  I.    The  Problem  of  Repeated  Trials. 

The  celebr<atc(l  theorem  which  gives  the  title  to  the  present 
chapter  is  of  central  importance  in  the  theory  of  mathematical 
probability.  Certain  persons  have  argued  that  it  gives 
a  proof  of  our  second  empirical  assumption.  This  is  an  error. 
No  mathematical  formula  can  prove  this  assumption  which  is 
deduced  from  experience  of  concrete  cases.  The  confusion 
arises  from  the  fact  that  the  theorem  deals  with  ratios  which 
arise  when  an  experiment  is  tried  a  large  number  of  times 
under  identical  conditions. 

Fundamental  example]  The  'prohahil'dij  for  biicccss  in  a  certain 
trial  is  p,  the  contrary  ■prohahlUty  for  failure  is 
q  =z  \—'p.  If  n  trials  he  mcu/e  under  the  same  essential 
conditions,  what  is  the  prohahility  for  exactly  r  successes 
and  n  —  r  failures  ? 

The  probability  of  starting  off  with  r  successes,  and  follow- 
ing with  failures  thereafter  is  p^'q^^~'',  and  the  probability 
sought  is  the  product  of  this  multiplied  by  the  number  of 
ways  in  which  the  n  trials  can  be  divided  into  r  successes 
and  n  —  r  failures,  namely 

rl{n-r)\^'  ^  ^  ^ 

Before  deducing  results  from  this  very  important  formula, 
we  shall  give  one  or  two  auxiliary  results  which  arc  of 
interest. 

Example   l]     The  prohahilities  being  as  above,  it  is  agreed 

that  a  man  shall  receive  one  dollar  for   each  trial 

necessary  to  achieve  exactly  r  successes;   what  is  his 

expectation  ? 

This  amounts,  by  theorem  1  of  the  last  chapter,  to  asking 
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what  will  be  the  average  number  of  trials  necessary  to  achieve 
r  successes.     If  this  number  be  n  we  have 


n  =  rjf  4-  (r  +  1 )  2/qr  +  (?■  -f  2) 2fq'^ — 


+  ... 


k  -00 


O'  +  '^y-  ..rj. 


=  rj/(l-(?)-('  +  i) 

=  r/p.  (2) 

This  suggests  the  idea  that  since  the  average  value  of  n, 
when  r  is  given,  is  r/j),  so  the  average  value  of  r,  when  n  is 
given,  will  be  np^  and  this  we  shall  soon  see  to  be  the  case. 

li^xample  2]  In  n  successive  trials  of  an  event  the  probabilities 
for  success  being  _2?i,  Pz-'-Pn  'respectively,  v:hat  is  the 
2)robability  for  justjr  successes? 

Let  the  equation  whose  roots  are  2'^i'^  Vi'-'Pn  respectively 
be  written 

(ic— Pi)  (x—p.^  ...  (x—pn)  =  a;'*  — Sp>;""^+^•o.'J""" ...  ==0. 

The  probability  sought  will  be 

where  the  first  products  give  every  term  of  8,.,  and  the  pa  in 
the  last  factors  are  those  which  do  not  appear  in  the  first 
ones.     Multiplying  out  we  get 

+  i-^,  =  s,-(r+I)s,.,  +  ^-^^^.-^^...  +  ....         (3) 

This  is  known  as  '  King's  formula '. 

We  now  return  to  our  formula  (1)  and  ask  the  important 
question,  for  what  value  of  r  will  this  be  a  maximum  1  To 
find  this  maximum,  we  write  the  ratios  of  this  term  to  tlic 


Problems. 

1.  Deduce  formula  (1)  as  a  special  case  of  (3). 

2.  An  event  can  happen  in  k  mutually  exclusive  ways,  and  no  others, 
the  respective  probabilities  being  pj,  Po...pi^.  Find  the  pro))abiHty  that  in 
n  trials  it  will  happen  in  the  first  way  r^  times,  in  the  second  r^  times,  in 
the  kth  rj.  times. 

2686  X) 
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preceding  and  to  the  succeeding  ones.     The   first  ratio  will 
be  greater  than  or  equal  to  unity  when 

7?  —  r  +  1     p  ^  . 
•  -  ^  1, 

/•  q 

(n-\-l)'p  ^  r. 

In  the  same  way,  the  second  ratio  will  be  greater  than  or 
equal  to  unity  when 

!:±i!^>i, 

ib  —  r  p 
rq-^q  ^  n^)  —  rp, 
T  ^  np  —  q. 
We  have,  thus,  for  our  largest  term 

7ip-\-p  ^  r  ^  np  —  q. 
The  two  limits  differ  from  one  another  by  unity. 

Theorem  1]  If  the  probability  for  success  of  an  event  be  p,  tliat 
for  failure  q,  in  n  trials  the  most  likely  number  of 
successes  will  be  that  integer  which  lies  between  the 
limits  np-\-p  and  n2:)—q. 

In  practice,  it  is  usual  to  take  wp  as  the  most  likely  value 
for  r,  the  number  ^^  _  'i^  —  ^p  u\ 

is  called  the  discrepancy.  Let  us  find  its  average  value,  that 
is  to  say,  the  expectation  of  a  man  who  will  receive  a  sum 
equal  to  this  discrepancy. 

r  -  11  , 

'^  rl(n  —  r)\ 


r  =--n  ^ 

=  2  r ^^    "^^       p'r-'-np(p  +  qr, 

r  =  0  '  ^  '  ' 


Now  rp^'q''-''  ^p—p^'q''-'']  p  +  q=  1. 


Hence  we  have 


p  —  (/)  +  q)^  —  np  (p  +  q)^  =  np  —  np 

=  0. 
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This  shows  that  ivp  is  not  only  the  most  likely  value  of  -r, 
but  is  also  its  average  value.  Let  us  now  find  the  average 
value  of  the  square  of  the  discrepancy.     This  will  be 

71 ! 


2  (''''^p^^:^(^)\P'''i 


—.    y   -1.2 1 v^' o^~^ 

^      r\{n-T)\'-    ^ 

>•  =  H 


n\ 


>■  =  ,1 

''^      r\  {n  —  rY- 
,=0  ^  ' 

But  r-'lfq^-'  =  V  ^  {V  -jy,  ft-"')  • 

Hence  our  first  term  is 

^  Vp  \}'  ^  ^^'  "^  '^^'']  =  ^  ^)  f  "^'  ^^  "^  "^^""'^ 

=  np  (p  +  7)""^  +  n  {n  —l)p^  (p  +  7)^-2 
=  np  +  71^^^  —  np"^. 

Our  average  value  is  thus     np  —  niy^  =  npq.  (5) 

The  expression  d/n  =  r/n—p  (6) 

is  called  the  relative  discrepancy,  for  it  is  the  discrepancy 
between  the  actual  proportion  of  successes  and  the  average 
proportion  of  successes.     The  average  value  of  its  square  is 

m/n-  (7) 

Let  us  see  what  is  the  probability  of  a  discrepancy  not 

greater  numerically  than  a  given  positive  number  D.     The 

number  of  such  discrepancies  is  2i)+  1  and  the  probability  is 

less  than  this  number  multiplied  by  the  probability  of  a  zero 

discrepancy.     Let  us  calculate  the  variation  of  this  latter  as 

n  increases  indefinitely.     We  have  to  find  the  limit  of 

n\        /r\^'  /n  —  r\^~' 

(  -  I  ( )  r  =  np     71  — »  X  . 

?'!  {n~r)\\n/  \    n    / 

Let  us  first  find  for  what  value  of  r  this  will  be  a  minimum, 
n  being  fixed.     Changing  r  to  r  +  1  we  get 

n\  /r+lY+^/7i-r-lY^-'-\ 

(r+1)!  (/?,-?'- 1)!\    n    )       \        ,1       ) 
t  D  2 
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This  divided  by  the  previous  expression  will  be 

How  long  will  this  expression  be  as  great  as  1  ?  Evidently 
as  long  as  its  logarithm  is  not  negative. 

^•H(i  +  J.)  +  ('^-r)log(i  -  ,;;:^,)-log(i  -  :^.)^<^- 

Approximately,  if  r  and  n  be  very  large, 

-J L>0, 

2{n  —  r)      2r~ 

n  —  r  <  r. 

Theorem  2]  Whe}i  the  number  of  trials  is  given,  the  2>'^"oh- 
ahility  of  a  zero  discrepancy  is  niinimum  ivheii  the 
'probability  of  siiccess  for  an  Indlvuhval  trwl  is  equal 
to  one  half. 

It  would  really  be  more  interesting  to  know  for  what 
value  of  /'  this  probability  would  be  a  maximum,  not  a  mini- 
mum, but  this  problem  does  not  yield  to  treatment  so  easily 
as  the  other.  Instead  we  shall  show  that  for  every  value 
of  7',  other  than  n  or  0,  the  probability  of  a  0  discrepancy 
approaches  0  as  a  limit,  when  n  increases  indefinitely. 

Tlie  probability  fur  a  discrepancy  d  —  r  —  np  is 

{tip  +  d)l(n(j-'d)r  ^ 

and  y    — ,-,np+i1f.nq-(J  _  1 

.f'.^^^^^ip  +  dy.iu^j-dy.^        ' 

The  fjreatest  term  is       ~ p'^Pn^^l. 

Suppose  that  pq  ^  0  and  that,  contrary  to  fact,  this  greatest 
term  is  always  >  K  >  0. 

Then  ; — '- n'»p+''(i'"'i-" 

{iip  +  d)\  (iiq-d)r         ^ 

_    nq  (nq—1)  ...(nq  —  (l-\-l)    /p^^  ii\ 


I 


{np  +\){ lip  +  2; . . .  {np  +  d)  \q/       {n'p) !  {nq) !  ^ 

nq(nq~l)...(nq-'d+  1)    //jV'  ;- 
{ up  +l){itp+2)...  [up  +  d)  Vy  / 


> 
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For  any  fixed  cZ,  the  limit  of  this  as  ^^  — >  oo  is  K.  Now- 
let  V  be  so  large  that  vK  >  1.  Then  let  n  be  so  large  that 
np  >  V,  7iq  >  V.  The  limit  of  each  of  the  v  largest  terms  is 
K^  hence  the  limit  of  their  sum  is  vK  >  1,  which  is  incon- 
sistent with  the  fact  that  the  sum  of  all  terms  is  1.  This 
proves  the  falsity  of  the  assumption  that  the  largest  term  is 
always  greater  than  K.  The  probability  of  a  discrepancy 
will  thus  approach  0  as  a  limit.  The  same  is  true  of  2  D  +  1 
times  this  probability,  a  number  greater  than  the  probability 
of  a  discrepancy  numerically  not  greater  than  D, 

In  the  case  of  the  relative  discrepancy  the  matter  is  exactly 
reversed.     We  see  from   (7)  that   the  average  value  of  the 
square  of  this  decreases   indefinitely  as    n   increases,  hence 
the  probability  that  the  square  should  have  a  value  as  great 
as  any  finite  value  approaches  zero  as  a  limit.     We  thus  get 
Bernoulli's  complete  theorem]  Wkoi  the  number  of  trials  is 
increased   indefinitely,   tlie  prohablllty   that    the  dis- 
ivepa'ncy  shall  remain  numerically  less  than  any  given 
■niimher,   and   the    lyrohahility    that    the    relative  dis- 
crepa)icy  shall  remain  numerically  greater  than  any 
given  number,  will  both  approach  zero  as  a  limit. 
This  theorem  is  always  regarded  as  central  in  the  whole 
doctrine  of  probability,  and  although  it  emphatically  does  not 
tell  us  anything  as  to  how  events  have  to  occur,  it  is,  under 
the  conditions  of  our  first  empirical  assumption,  exceedingly 
illuminating   as   to   the   way  that   they   usually   occur.     Of 
course,  to  such  a  w^riter  as  Keynes,  to  whom  mathematical 
probability  appears  but  a  subsidiary  part  of  the  whole  subject, 
Bernoulli's  theorem  is  of  secondary  importance,*  but  in  any 
objective  treatment  it  must  be  fundamental.     We  shall  later 
suggest    another   much    shorter    proof,   which   depends,   un- 
fortunately,^ upon  the  use  of  approximate  expressions  to  be 
developed  presently.     The  proof  given  above  is,  perhaps,  new. 
Bernoulli   himself  considered  only  the  case  of  the   relative 
discrepancy,  his  statement  being  as  follows :  f 

Sit  igitur  numerus  casuum  fertUium  ad  numeriim  steri- 
Hum  vel  praecise,  vel  proxime  in  ratio ne   r/s   adeoqae  ad 

*  Keynes,  loc.  cit.,  pp.  336-45. 

t  Jacobus  Bernoulli,  Ars  Conjeciandi,  Basle,  1713,  p.  236. 
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numeruon  omniitm  la  ratlone  r/r  +  s  sen  r/t  quam  ratloneni 
termimnt  llmites  r+1//,  r—\/t.  Ostendemlum  est,  tot  posse 
capi  exijerhnenta^xd  dates  qaodlibet  (inita  c)  viclbxis,  verisimi- 
lias  evaded  numerum  fertiliuon  observationuon  intra  hos 
llmites  quain  extra  casuiom  esse  h.e.  numeruvi  fertilium  ad 
numerum  omnium  obsei^atio tiu7)i  rationem  hahltuum  nee 
majorem  quam  r+  \/t  '\iee  mlnorem  quam  r—  \/t. 

§  2.    Stirling's  Formula. 

In  our  formula  (1)  for  repeated  trials,  as  well  as  in  many 
formulae  of  an  elementary  nature,  we  have  to  do  with 
factorials.  These  are  easy  to  write,  and  not  hard  to  evaluate 
when  the  numbers  involved  are  small,  but  can  become 
exceedingly  difficult  to  estimate  when  the  highest  factor  is 
large.  It  is  therefore  extremely  useful  to  be  able  to  replace 
them  by  approximate  values. 

What  do  we  mean  by  an  approximate  value  for  an  ex- 
pression ?  Ordinarily  these  words  signify  another  expression, 
differing  from  the  fii-st  by  a  small  quantity  ;  so  small,  in  fact, 
that  its  presence  may  be  overlooked.  Unfortunately  we  have 
no  such  scheme  for  approximating  to  factorials.  But  when 
a  function  of  a  certain  argument  increases  indefinitely  with 
that  argument,  then  a  new  function  bearing  to  the  first 
a  ratio  that  approaches  unity  as  a  limit  may  be  used  to 
replace  it  in  the  sense  that  the  error  will  bear  an  infinitesimal 
ratio  to  the  function  itself.  The  difference  between  the  two 
functions  may  actually  increase  indefinitely,  but  if  their  ratio 
approach  unity  as  a  limit,  then  in  ratio  problems  the  one 
may  be  used  safely  as  an  approximate  representation  of  the 
other.  For  instance,  the  difference  between  the  two  functions 
of  X,  x^  and  x^-^x,  increases  indefinitely,  but  their  ratio 
approaches  1  as  a  limit.  A  function  whose  ratio  to  a  given 
function  approaches  1  as  a  limit,  as  the  two  increase  in- 
definitely, is  called  an  asymijtotlc  expression  for  the  given 
function.  It  is  our  present  task  to  find  such  an  expression 
for  71 !.  We  shall  do  so  by  guessing  at  various  factors,  till 
we  reach  a  form  where  the  unknown  factor  may  be  treated  as 
a  constant. 

As  a  first  approximation,  let  us  note  that  n !  has  n  factors, 
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the  largest  of  which  is  n.     We  therefore  start  with  the  crude 
assumption 

(jl  +  1)!  =  (71+l)'^^l0(il+l). 

(P{n)/(f){n^\)  =  (l  +  l/>i)». 
Lim  n—^cc     ^  .  .    .     .=  e  ^ 


Dividing 


Lim  01  — >  00 


0(71) 

(p(n  +  k) 
(p{n) 


=  e 


-k 


This  suggests  a  second  factor,  and  we  write 
i/r(7l+l)  V  a/    ' 

log  n^- log  (71+1)^  =-log(l  +  ^y 


Lim  n—>  x 


loi 


=  1. 


iog(-^y 

This  leads  to  our  next  approximation 

71!  =  ot^^e~''hi^  F(n), 

i^(7i4-l)~^     V^  +  'J        • 
Logi^(?i)-logi^(ii+l) 

/         1\/1  1  1  ^      ,•     \ 
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The  series  on  the  right  is  convergent,  and  is  composed  of 
terms  which  are  alternately  positive  and  negative.  The  sum 
of  a  number  of  terms  is  alternately  greater  than  or  less  than 
the  limit.     Hence 

\ogF{n)  >  log  F{n+l)  >  \ogF{ii)-l/12n^, 

\ogF{ii)  >  logi^(x) 


1  .  _L_+_J_  +  ... 


1      ^--L^,.     ...\   .„.+... 


^  (n-l)n      n{ii+l)      (h+1)(71  +  2) 
The  convergent  series  on  the  right  may  be  written 

yii^  ~  n)  "^  \n  ~  n+l)  "^  \n-\-l  ~  71  +  2/  n-\ 

log  F {n)  >  log  i^(x  )  >  log  i^ (n)  -  j—ZTi^ ' 

Again     log (l  +  j^)  >  Y^,  -  ^^J^^  >  12(..-1) 

when  n  >  6.     Hence 

F(n) 


locrF{n)  >  log  i^  (go  )  >  log 


] 
1  + 


10 '^t 


If  thus,  when  n  >  G  we  replace  F{n)  by  i^(co),  we  have 

divided  bv  a  factor  lying  between  the  limits  1  and  1  +  ----• 

There  remains  only  the    task   of  finding   the    value  of  the 
constant  F(:/o  ).     We  do  this  by  a  roundabout  method. 

p7r/2 

Let  I{rn)—         sin^'^rJa;, 

Jo 

I(2m)  >  /(2m+l)  >  /(2m  +  2), 

/(2m+l)       /(2m  +  2) 
^      J  [2'ni)     ^  ^T(2m)~  * 


> 
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It  will  appear  in  the  course  of  our  work  that 
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'7r/2 


^.                     /(2m+2)       , 
Lirn  m  — >  X  — jr—- =  1 

I  {2711) 


sin"  X  dx 


=  -  [sint^-i  ^ cos  ajjo'^/^  +  (,i _  i)  f ""    sin"  ^ a;  cos^ xdx 

Jo 


71—1 


71 


.7r/2 


Sin 


I  {2m) 


2m— I 


''--xdx, 
2m-Z 


7(2m+l)  = 


2j?i 

2(m-l) 

2  m— 1 

2m- 3 

2  m 

2(m-l) 

2m 

2(m-l) 

2771+1 

2m- 1 

2m 

2(m-l) 

'it/2 


dx 


1  IT 

2  2' 


p7r/2 


sin  aj  (ia; 


0 


2  m  +  1       2  /7Z,  —  1 
i(2m+l)  [27)1-2(771-1 


1     ] 
3'T' 

.112 


/(2m)  (2m+i;[(27H-l)-(27?i-3}  ...  1]-    it 

_  [2m-2(m-l)-  ...  •  l]*    2 

""  (2  771+  l)[27yl!j=^  '  TT 

_  2^"^m^"'  e"'^'"  (-y/m)^  [i^(m)]* 2 

"~  (2771+ l)(2m)^"^(e-^"^)2(y27}"i)-[i^(2 771)]-    "^ 

~  (2m+l)7r  '  [i''(2m)J^* 


Passinor  to  the  limit 


1  = 


/^(oo)* 


27r     ' 
JP(x)  =  v/2^. 

Stirling's  formula :  If  the  expression  n !  he  replaced  by  the 
expression  n^  e~^W2Tpii  the  true  value  will  have  been 

divided  by  a  number  lying  bettueen  1  and  1  +  — —  -^ 

*  stilling,  Mdliodus  clij^trentiulis,  Sec,  London,  176^,  p.  135. 
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A  table  of  the  values  of  loge^  and  loge~^  will  be  found  at 
the  end  of  the  volume. 

We  shorten  this  formula  by  writing  the  untrue  equation 


71 !  =  rt'*e-w  V27rn.  (8) 

The  development  above  shows  that  we  should  be  more  accu- 
rate if  we  wrote 


V         \2,J 


The  gain  in  accuracy  is,  for  most  purposes,  not   worth   the 
additional  complication.     For  instance,  we  find  * 


10  !  =  3,628,800       lO^^e-io  ^2077  =  3,598,699. 

Difference  =  30,101.     Ratio  =  1-008. 

As  an  example  of  the  use  of  Stirling's  formula,  let  us  calcu- 
late the  probability  of  a  zeio  discrepancy.  We  have  by  1) 
and  l] 


^ '         -pnpqn,  = n^e-'W2nn ^^,.^^„^ 


(np) !  {Tiq} !  (np)"''(ii?)''«e-"  (J'+S)  2  n-ri  V'pq 

1 


(9) 


^/2Trn'pq 

This  expression  decreases  indefinitely  as  n  increases,  which 
gives  an  immediate  proof  of  2]  and  of  Bernoulli's  theorem. 

§  3.    The  Probability  Integral. 

We  have  seen,  by  two  different  methods,  that  as  n  increases, 
the  probability  of  any  one  discrepancy,  even  the  most  likely, 
approaches  0  as  a  limit.  In  order,  therefore,  to  concern  our- 
selves with  probabilities  of  finite  magnitude,  it  is  wise  to 
change  our  problem,  and  calculate  tlio  probability  that  the 
discrepancy  shall  lie  within  specified  limits.     We  are  imme- 

*  C/.uber,  WahrscJmnlichkeitsrechnung,  cit.  p.  24. 

Problem. 

A  coin  is  thrown   100  times;    calculate  the  probability  that  it  will 
show  exactly  50  heads  and  50  tails. 
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diately  faced  by  the  question,  What  will  be  limits  of  a  reason- 
able size  ?  We  have  seen  that  the  average  value  of  tlie  square 
of  the  discrepancy  is  proportional  to  n,  and  this  suggests  the 
propriety  of  calculating  the  probability  of  a  discrepancy  l>ing 
between  two  different  constant  multiples  of  \^n.  In  particular, 
let  us  calculate  the  probability  of  a  discrepancy  lying  between 
Sj  \/2  n2Jq  and  z^  V2npq,  including  the  limits. 
We  first  revert  to  (1),  putting 


r  =  ri2)  +  z  \/2  ivpq,     n  —  /•  =  nq  —  z  ^2  np(i. 
We  seek 


Z  =  Z2 


2                                       11 !  (ni)  +  za/2  npq)     (nq — z  A/2npq) . 
—             jj  '  q 

2  ^  ^^  (-Jip  +  0  V  2 n2:>q) !  (nq  —  0  v  2  n2)q} ! 

The  deficiency  increases  by  1  each  time,  and  the  expression 
above  is  the  sum  of  a  number  of  ordinates  at  unit  intervals, 
and  so  the  sum  of  the  areas  of  a  system  of  rectangles  of  unit 
base  and  varying  altitude.  We  wish  to  find  the  limit  of  this 
sum  as  n  increases  indefinitely. 

The  next  point  to  note  is  that  as  n  increases,  instead  of 
imagining  the  height  of  each  rectangle  to  decrease  toward  0 
while  the  total  base  length  increases  indefinitely,  we  might 
equally  well  imagine  that  the  total  base  {z2  —  z^)V2n2)q 
remains  constant,  while  the  bases  of  the  individual  rect- 
angles approach  0  as  a  limit,  so  that  the  sum  of  the  areas  of 
the  rectangles  approaches  the  area  under  a  smooth  curve  as  a 
limit.  Our  problem  is  to  find  the  nature  of  this  curve.  We 
have,  essentially,  an  infinite  sum  of  infinitesimal  terms,  and 
Duhamel's  theorem  tells  us  that  we  may  replace  each  by 
another  infinitesimal  bearing  to  it  a  ratio  which  approaches  1 
as  a  limit.*  In  fact,  it  will  be  wise  to  split  each  of  the 
various  quantities  to  be  summed  into  factors  and  replace  each 
factor  in  this  way.  As  a  first  substitution,  we  replace  the 
various  factorials  by  their  equivalents  from  Stirling's  formula 
(8),  getting 

*  This  tlioorem  is  at  the  basis  of  the  integral  calculus.  See  e.  g.  Osgood, 
Differential  and  Integral  Calculus,  New  York,  1907,  p.  164  ;  Annals  of  Mathematics^ 
Series  2,  vol.  iv. 
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''  ^*»e-"  \/27r7i 


= ,,  (np  +  0  ^2  npqfP  +  ~^^^"?'«  (^u;  -  ^V2  n'pqf-^-^'^'^''P'^  2  Tre"^^ 


.^(wp  +  z^'Znpq)  Jiiq-^Z'^'l npq) 


^  wp  •\-Z'J2  npq  ^  nq  —  z  V2  npq 


v2Vi7 


II 

>  2  TT  (ivo  4-Z\/ 2^^ 


(iij^;  +  0  \/  2  7ij^jg)  (my  —  s  V2npq) 


/  71^:)  \«^  +  *V^"i^S/  ^'Z  ^iq-z>y2npq 

^np  +  zV2npq'  ^nq  —  zV2npq/ 

We  next  note  that  as  the  quantity  zV2wpq  increases  by 
unity  each  time,  we  may  properly  say  that  z  has  an  increment 
A5;  where  ^^  =  \/V(2n2}q), 


n  -♦00    \  « 


•yi.  -     ^^  _  t 


2TT{np  )rzV2npq)  {nq  —  zV2  npq)       V tt 
The  ratio  approaches  1  uniformly  as  we  see  by  expanding. 


,      /n2)  +  z^/2  wpq  v^p  +  s  V^  npq 
°  V  np  ) 

1       /  i  ^7  —  5;  a/  2  7ij?^  \  "(if  -  ?  V^  wpj 


,      (np  +  zy/2  npq^^P  +  =V'-inpq  /  7^^  —  z\/2  npq  wy -cv/2 «2>7 
^°\  7<y^  /  \  ug  / 

1 


=  .2  +  -^  iP(,). 
Vn 


Hence 

Lim/^  712^  \np  +  ''V'^npq  /  nq  s.nq-Zy/2npq 

'^  -♦  =°  ^  or  i )  4-  'z  a/ 9.  n.nn  /  V  71,  > )  —  ^  a/  2  77,^  in  ) 


np  +  zV2  npq  ^  ^np  —  zV  2  npq ' 


~  e-^'  =  1 


Since  F{z)  remains  less   t?ian  a  fixed   amount,  the  ratio 
approaches  1  uniformly. 
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We  therefore  seek 


a  =  «2 


Lim    ^  .-^g  ^^ 

and  this,  by  the  fundamental  theorem  of  the  integral  cal- 
culus, is  * 

This  formula  may  be  made  slightly  more  accurate  in  the 
following  fashion.  When  a  large  number  of  rectangles  is 
replaced  by  a  curve,  the  number  of  rectangles  is  less  by  one 
than  the  number  of  points  where  the  curve  meets  an  upriglit 
side  of  a  rectangle.     When  we  replace 

^f{z)^z  by  [  f{z)dz 

if,  as  is  perfectly  legitimate,  the  value  of  /  be  taken  for  z  at 
the  left  end  of  the  interval,  then  the  tQYmf{z.^)  Az  h  lost  in 
passing  to  the  integral,  so  that  a  more  accurate  form  for  the 
probability  will  be 

-2     ,  6  2 


^/ 


TT 


e    ~  dz  H J—  A2. 


The  number  of  consequences  deducible  from  these  formulae 
is  very  large.  Let  us  first  find  the  probability  that  the 
numerical  value  of  the  discrepancy  shall  not  exceed  d. 

Z.^'s/2  )Vpq  =  —  Sj  'y2npq  —  d. 
This  gives  us 
Laplace's  theorem  f]  If  the  prohahility  for  success  he  p,  and 
that  for  failure  7=1  — ^),  then  the  prohahiUty  that  in  n 
trials  the  discrepancy  will  not  exceed  numerically  the 
number  d  is  nearly  equal  to 

d 

-l-l^'^'e-^'dz,  (10) 

v7rJo 

*  Ibid.,  p.  155. 

t  Laplace,  QCuvres,  vol.  vii,  Paris,  p.  284.  For  an  estimate  of  the  size  of 
tlie  error  committed  by  using  these  formulae  see  Castelnuovo,  loc.  cit., 
pp.  83  ff.  The  development  which  we  hnve  given  follows  the  general  lines  of 
Markoif,  loc.  cit. 
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and  is  still  more  nearly  equal  to 


2 

Vtt. 


'''"""e-^dz+  -_i=e-^V2np7.  (11) 

)  V2npq 

It  cannot  be  too  often  emphasized  that  these  are  merely 
approximate  expressions  for  the  quantities  desired.  The 
integrand  is  an  even  function,  suggesting  that  equal  positive 
and  negative  discrepancies  are  equally  likely,  and  this  is  not 
the  case.  When  ^<  ^,  there  are  positive  discrepancies  possible 
which  are  much  greater  than  any  possible  negative  discrepan- 
cies. Let  us  push  this  a  little  further,  before  we  leave  the 
exact  formulae  for  ever.  The  probabilities  for  a  discrepancy 
d  or  —d,  are  by  (1) 

{np  +  d)\{7iq-d)l^         ^         '  (n2^-d)l(nq+d)r  ^ 

The  ratio  of  the  second  to  the  first  is 

(np-hd)\(nq  —  d)l  /q\^'^ 
{n'p  —  d) !  inq  +  d) !  Kj)  / 

_  rnp)q  +  qd "I  Vnpq  +  q  (d—l)^      Vupg  —  g  (d—l)l 
\_n2)q+2^^-i  Lnjyq+'p  (f^— 1)  J  '"  Lnpq  —  'p  (d—  1)J 

When  'p  <  q,  and  r?  —  1,  this  expression  is  greater  than  1. 
Further,  when  we  increased  by  unity,  the  product  of  the  two 
factors  multiplied  in  is  greater  than  one,  as  long  as 
nj^q  >  d{d—l).  Hence  the  expression  is  surely  greater  than 
one  for  these  values  of  d,  and  we  get 

Theorem  3]  When  the  prohaJ)iUty  for  success  is  less  than  one 
half,  the  prohahility  for  each  small  possible  positive 
discrepancy  is  less  than  that  for  the  corresponding 
negative  discrepancy ;  the  ratio  of  the  two,  hoivever, 
approaches  1  as  a  lionit  (fs  the  number  of  trials  increases 
indefinitely. 

This  may  also  be  surmised  from  the  fact  that  the  average 
discrepancy  is  zero,  and  as  there  are  in  the  present  case  more 
possible  positive  discrepancies  than  negative  ones,  it  would 
seem  natural  that  in  every  case  the  probability  for  the  positive 
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discrepancy  would  be  less  than  that  for   the   corresponding 
negative.* 

Let  us  return  definitely  to  our  approximate  formulae.     The 
function 


e    ^  (Ia 


(12) 


is  absolutely  fundamental  in  the  theory  of  probability.  A 
table  of  the  values  of  this  function  will  be  found  on  pp.  209- 
13.    Let  us  find  its  value  when  a;-^oo .     We  wish  to  evaluate 


u  = 


Let 


z  =  ay, 


e    "  dz. 


-  0 


2a  ^°° 


2  a 


f»00 


e-(«z/)%Z7/, 


V 


"^^ 


VttJ 


0 


a  da 


-Jo 


V  TT 


e-^\la. 


-0 


We  may  reverse  the  order  of  integrations  on  the  left,  hence 


2       4 
u^  =  - 

TT. 


poo 

poo 

dx 

0          J 

0 

e-i^+^')^\cda. 


The  first  integral  is 

r_  __J___g-(l+^')a'T  = I 

L      2(\+x')  Jo        2(1- 


+  x') 


2v  ' 


t6^ 


_2rdx 


=  1 


Vtt 


e-^  c^2=  1. 


(13) 


At  this  point  the  careless  reader  might  be  led  into  a  very 
grievous  mathematical  error  by  supposing  that  this  formula 
was  self-evident.  He  might  say,  '  The  probability  that  the 
discrepancy  will  take  some  value  between  -co    and    co     is 

*  Cf.  Simmons,  '  A  New  Theorem  in  Probability ',  Proceedings  London  Math. 
Soc,  vol.  xxvi,  1895. 
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equal  to  1.  This  is  the  probability  obtained  from  (10)  by 
letting  d  become  infinite,  which  proves  (13)  without  more 
ado.'  But  we  must  repeat  again  and  again  ad  nauseam 
that  (10)  is  merely  an  approximate  formula,  for  the  following 
reasons : 

(1)  The  integrand  is  an  even  function,  whereas  we  saw  in 
theorem  3]  that,  in  the  general  case,  equal  positive  and  nega- 
tive discrepancies  are  not  equally  likely. 

(2)  We  started  with  the  assumption  that  we  were  seeking 
the  probability  for  a  discrepancy  between  two  limits  propor- 
tional to  the  square  root  of  the  number  of  trials.  We  obtained 
an  approximate  formula  good  for  such  limits.  This  formula 
gives  a  finite  if  extremely  small  probability  for  a  discrepancy 
of  every  numerical  magnitude,  whereas  it  is  physically  impos- 
sible to  have  a  discrepancy  numerically  greater  than  the 
larger  of  the  two  numbers  np,  nq. 

The  verification  of  formula  (13)  by  means  of  the  approxi- 
mate formula  (10)  must  he  looJced  Ujxm  as  a  fortunate 
accident. 

Strangely  enough  there  is  another  such^ccidental  verification 
which  we  now  proceed  to  establish.  We  apply  the  law  of  the 
mean  to  (10)  getting,  as  the  probability  for  a  discrepancy  close 
to  X,  the  expression 

1        _-£L 

2ui>qdx  (14) 


\/2  npq 

The  expectation  of  a  man  who  will  receive  a  sum  equal  to 
the  square  of  the  discrepancy,  i.  e.  the  limit  of  the  average 
value  of  that  square  if  formula  {10)  were  universally  valid, 
would  be 


Vtt 


00 


r»oo 


dx 


0    V2npq  V2nTrpq 


x^ 


2x'^e  2npqdx. 


0 

X  .  dx 


Let  , =  t     ~7==  =  dt. 

V2npq  V2npq 


Problem. 

Prove  Bernoulli's  Theorem  by  means  of  (10)  and  (13). 


.  2npq 

Averao^e  =  — j^ 
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r*ao 


0 

Le'u  2te'^\U  =  dv     t  =  u. 

Since  te~^'  vanishes  at  both  limits,  if  we  integrate  by  parts 

we  find  : 

.  2npq  r*    _^  ,, 

Average  =  — j-^       e      at  =  npq. 

Vtt  Jo 

This,  by  a  fortunate  accident,  checks  exactly  with  (5). 

Let  us  calculate  the  expectation  of  a  man  who  will  receive 
a  sum  equal  to  the  numerical  value  of  the  discrepancy.  In 
this  case  the  exact  formula  is  hard  to  manipulate ;  we  there- 
fore, emboldened  by  recent  success,  take  formula  (10)  as  though 
it  were  exact  and  universally  valid,  knowing  that  it  is  always 
near  the  truth.     We  have 


V2  iijxjTr  ^ 


xe    ''i.itp'idx=  — -^ — 

0  V  77       ^  0 


2  ie~^\lt 


=  ^-Vnpq. 

What   is  the   value  of  the   numerical   discrepancy   which 
there  is  a  half  chance  of  reaching?     We  have,  by  (12) 

^V2npq^       ^ 
Now,  by  our  table,  p.  209, 

6(0.4769)  =  i. 
Hence  d  =  0-4769  V2  Vupq 

—  0-67iW  ii2)q. 
Let  us  recapitulate  these  last  results.     The  square  root  of 
the  average  value  of  the  square  of  the  discrepancy  is  called 
the  mean  dii<crepancy  ;  its  value  is 

\/  npq  (IS) 


Problem. 

Calculate  by  the  same  two  methods  the  average  value  of  the  fourth 
power  of  the  discrepancy. 

2681;  E 
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The  average  value  of  the  nuinerical  value  of  the  dis- 
crepancy is  called  tlie  ((verage  (liscrcpaiLcy)  its  value  is 

0-798  v/7/^^y  (16) 

The  positive  number  which  there  is  a  half  chance  that  the 
numerical  value  of  the  discrepancy  will  not  exceed,  is  called 
tha  jtrohable  dmrepuhcy  \  its  value  is 

0-674  y^  (17) 

How  accurate  lias  Bernoulli's  theorem  turned  out  in  practice? 
There  is  a  good  deal  of  testimony  on  this  point,  generally 
highly  unsatisfactory.  Karl  Pearson  made  an  analysis  of  a  large 
number  of  statistics  from  the  roulette  games  at  Monte  Carlo, 
and  came  to  the  conclusion  that  whereas  the  alternation  of 
red  and  black  was  satisfactory,  there  was  an  incredible  excess 
of  long  runs.*  Exactly  opposite  conclusions  were  reached  by 
Marbo,t  who  maintained  that,  on  the  contrary,  short  runs 
were  predominately  the  rule.  Then  Griinwald  took  up 
Marbe's  work  and  showed,  at  least  to  his  own  satisfaction, 
that  the  apparent  result  was  due  to  faulty  grouping  of  the 
observations.  J  By  a  proper  re-grouping,  the  results  showed 
a  very  satisfactory  proportion.  An  account  of  the  work  of 
Marbe  and  Griinwald  is  given  by  Czuber  ;§  the  average  reader 
will  strongly  suspect  that  Griinwald  was  right,  and  the  other 
two  wrong. 

Example  3]  The  'phUosojjher  Buffo7i\\  one  day  threw  a  coin 
4,040  timet<,  mid  noted  that  heads  arrived  2,048  times. 
Is  this  reinarkahle  ? 

In  this  case  p  and  q  are  both  \,  a  —  4,040,  d  =  2S. 
The  chance  for  discrepancy  of  this  size  or  less  is 

e(  -4^"!  ^  0-622. 


^  v^202O^ 


*  Pearson,  The  Chances  of  Death,  vol.  i,  London,  1897. 

f  Marbe,  Aaitirphilusophische  L'ntersuchunyen  sin-  Wahr^icheinlichkeilslehre.  Leip- 
2ig,  181>9. 

X  Griinwald,  Isolierte  Gruppen  imd  die  Marbesche  Zahl  p",  Wurzburg,  1904.  I 
liave  not  been  able  to  verify  these  two  references  taken  from  C/uber. 

§    Wahrschi'inlichkeitsredinung,  at  vol.  i,  p}).  144  ff. 

11   Bertrand,  lor.  cit.,  p.  9. 
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There  are,  hence,  nearly  four  chances  in  ten  for  a  discrepancy 
numerically  larger  than  the  one  obtained,  and  the  result 
must  be  looked  upon  as  not  unnatural.  The  probability  for 
obtaining  exactly  this  discrepancy  is  less  than  that  for  a. 
discrepancy  0,  which  latter  is 

1  1 

< 


Example  4]  Tivo  men,  each  with  jive  'pistoles,  toss  a  coin.  Tha 
first  2)lc(yer  wins  if  it  shoiv  heads,  the  second  If  It  )<hoin 
tails,  the  loser  owing  the  ivlaner  one  iilstole.  This 
vioney  is  not  paid  iniTnedudely,  hid  an  account  Is  kept, 
the  balance  to  he  paid  at  the  end  of  the  game.  In  hovj 
many  turns  will  there  he  an  even  cham^e  that  the  loser 
is  more  In  deht  to  the  luinner  than  he  can  'pay  ? 
Here,  again,  the  chances  are  ^   for  each.     In  how  many 

turns  will  the  probable  discrepancy  be  5  ?     Applying  (17)  we 

tave  0-675\/iu=5, 

n  =  220. 

It  should  be  noted  here  that,  if  the  loser  had  paid  cash  each 
time,  there  would  be  more  than  a  half  chance  that  one  player 
would  be  ruined  before  now,  for  a  discrepancy  of  5  at  the  end 
of  220  turns  is  compatible  with  a  larger  discrepancy  at  an 
earlier  stage  of  the  game. 

Example  5]  Hcnu  many  times  must  a  die  he  throivn  to  pro- 
duce a  prohah'dity  of  jq  that  the  ratio  of  the  number 


Problems. 

1.  Ill  1850  tl>e  Swiss  astronomer  Wolff  tlirew  two  dice  100,000  times. 
The  two  showed  the  same  face  1G,G47  times.     Comment  on  this  result. 

2.  The  game  of  '  craps  '  was  explained  in  a  problem  on  p.  21.  Prof.  Ban- 
croft Brown  (American  Malhemalical  Mmithly,  vol.  xxvi,  1919)  has  tabulated 
the  results  of  9,900  turns,  where  the  two  players  won  4,871  times  and 
5,029  times.     Is  this  result  surprising? 

3.  Discuss  the  results  reached  by  Lieut.  Hoar  in  dealing  five  cards,  as 
given  in  the  problem  p.  21,  from  the  point  of  view  of  Bernoulli's  theorem. 

4.  How  many  times  must  a  coin  be  thrown  in  order  that  there  may  be 
9  cliances  in  10  that  the  discrepancy  is  numerically  greater  than  5  ? 

5.  Two  dice  are  thrown  100  times.  What  is  the  probable  discrepancy 
in  the  number  of  times  that  the  sum  7  appears? 

E   2 
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of  5*8  shoivn  shall  hear  to  the  number  of  trials  a  value 
between  3*^  and  -^^  ? 
Here  we  have  a  relative  discrepancy  of  +  ^^^,  hence 


§  4.    Games  of  Chance.* 

There  is  a  very  real  difficulty  involved  in  handling  some  of 
the  fundamental  problems  arising  in  games  of  chance  owing 
to  the  presence  of  a  subtle  but  important  psychological 
element  which  cannot  be  well  stated  in  mathematical  terms. 
Gamblers  are  notoriously  superstitious,  which  means  that 
they  are  irrational,  still  more  are  they  unmathematical.  For 
that  reason,  the  assumptions  which  are  made  are  of  a  tentative 
nature,  and  only  partially  represent  the  real  facts. 

In  the  games  of  chance  of  the  type  which  we  shall  consider 
there  are  two  individuals,  whom  we  shall  call  the  '  Player ' 
and  the  '  Banker '  respectively.  The  former  has  a  considerable 
freedom  in  deciding  upon  the  amount  that  he  will  stake,  but 
we  shall  assume  that  there  is  an  upper  limit  to  this,  as,  other- 
wise, a  syndicate  of  players  with  a  large  capital  might  be 
formed,  and  this  body  might  keep  on  doubling  the  stakes 
after  each  loss,  till  the  Banker  was  ruined.  We  shall  assume 
that  the  Player  has  a  fortune  A^  and  that  he  intends  to  play 
until  either  he  has  lost  this  amount,  or  won  the  sum  B  from 
the  Banker.  If  the  player  have  any  wisdom  at  all,  A  will  be 
less  than  his  total  fortune,  and  B  far  less  than  the  sum  of  the 
Banker's  quick  assets ;  the  word  '  ruin '  has  only  this  technical 
sense  for  us. 

Let  the  Player's  chance  to  win  an  individual  turn  be  |?,  the 
Banker's  chance  g  =  1  —p,  a  tie  being  excluded.  Let  P  be 
the  Player's  chance  to  ruin  the  Banker,  Q  the  chance  that  the 
Banker  will  ruin  the  Player.  We  must  begin  by  showing 
that  the  sum  of  these  two  is  1,  i.e.  that  there  is  no  finite 
probability  that  the  game  will  continue   indefinitely.     The 

*  The  greater  part  of  the  presejit  soction  will  be  found  in  an  article  by  the 
author,  '  The  Gambler's  Ruin  ',  Annals  of  Mathematics,  vol.  x.  Series  2,  1909. 
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proof  is  immediate  when  we  remember  Bernoulli's  theorem, 
for  the  only  way  to  avoid  the  ruin  of  the  one  or  the  other 
party  is  for  the  discrepancy  to  remain  below  a  certain  fixed 
limit,  and  we  have  seen  that  the  chance  for  this  decreases 
indefinitely  towards  0.  If  each  turn  be  fair,  in  the  sense 
defined  above,  i.e.  if  the  Player's  expectation  be  0  each  time, 
we  see  by  Ch.  II,  4]  that  the  Player's  total  expectation  is  0, 
hence  FB-QA  =  0,  F+Q=l, 

P  =  A/{A  +  B},        Q  =  B/(A^B).  (18) 

From  these  equations  wo  draw  two  important  conclusions. 
First,  the  Player's  chance  is  independent  of  the  amount 
staked,  or  the  game  played,  provided,  of  course,  there  is  no 
daDger  that  the  game  will  not  be  finished  for  lack  of  time. 
The  wisdom  in  any  player's  setting  a  low  figure  for  B  is 
evident.  Second,  who  is  the  Player,  from  the  Banker's  point 
of  view '^  Suppose  that  the  Banker  is  running  a  public 
gaming  resort,  and  that  the  game  is  one  of  pure  chance,  no 
question  of  skill  coming  in.  The  Banker  is  supposed  to  be 
ready  to  play  against  all  comers.  In  most  cases,  they  will 
take  opposite  sides,  more  or  less,  so  that  his  real  adversary 
is  the  surplus  of  those  who  back  one  chance  over  those  who 
back  the  opposite  chance,  but  there  is  always  the  possibility 
of  a  large  combination  of  players  taking  the  same  side,  in 
which  case  the  unfortunate  Banker  would  be  opposed  to  an 
adversa,ry  of  quasi  infinite  fortune  and  his  ruin  would  be 
certain.  In  consequence  of  this,  all  forms  of  public  gambling 
are  somewhat  favourable  to  the  Banker,  and  our  next  step 
must  be  to  study  the  chances  in  games  of  that  sort.  We  do 
this  by  adopting  an  ingenious  device,  due  to  De  Moivre, 
which  consists  in  making  the  game  fair  once  more  by  assign- 
ing fictitious  values  to  the  coins  used.* 

Let   us  assume  that  the   Player   has  A   individual   coins 

marked  a^  a\  ...a^"^,  while  the  Banker's  coins  are  marked 
a^,  a^'^^,  ...a^'*'^"^.  It  is  agreed  that  at  each  turn  the 
Player  shall  stake  his  a  coins  of  highest  mark,  while  the 
Banker   will  reply  by  staking  his   b   lowest   marked  coins. 

*  De  Moivre,  Doctrine  of  Chances,  London,  1756,  p.  52. 


54  BERNOULLI S    THEOREM 

In    this   lictitious   system    the    Player's   expectation    before 
a  turn  is 

r     a^'+^-a"  a"  -11 

—  a--    p -—   —  (/ 

L^       a-l  ^  a-1  J 

a—  1  '-^  ^J 

The  game  being  unfavourable  to  the  Pla}  er,  we  have 

2)b  —  qa  <  0. 
Consider  the  function 

fix)  =2)^'+^^--x''  +  q, 
f  (x)  =  x"-'^  [{a  +  6)  2^x^'  - (0- 
We  make  a  short  table  of  values 

X=0,  f(x)=q,  f{x)  =  0, 

x=l,  f{x)^0,  f  (x)  =  2)h - qa  <  0, 

X  =  \-  /^    .  1^'  >  1,    f{x)  = '-rx'^  +  q  <  0,    f{x)  =  0, 

x  =  (x> ,  f{x)  =  X  ,         f(x)  =  ^. 

Since  we  have  noted  all  the  real  roots  of /'(a;)  and  between 
each  two  roots  of/  there  is  one  of/',  we  have  just  one  real 
root  of/  w^hich  is  >  1.     Call  this  a. 

^>a«+^--a''  +  f/  =  0  a  >  1  (10) 

'pJ'  +  qa-''  =  1.  (20) 

If  a  be  given  this  value,  we  see  that  the  expectation  of  the 
Player,  as  calculated  above,  is  0  in  this  fictitious  measure. 
Hence,  the  Player's  chance  is  the  ratio  of  his  fortune,  fictitiously 
calculated,  to  the  sum  of  the  two  fortunes  in  the  same 
measure,  namely 

a^^a^+-.,.+a^-^     ^     a^  - 1 

Suppose,  next,  that  the  Player  determines  to  reduce  his 
stake  to  a/a-  for  just  one  turn,  the  Banker  simultaneously 
reducing  to  h/a- ;  after  that,  all  is  to  go  on  as  before.     Has  his 
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chance  been  improved  or  injured  ?     That  chance,  under  the 
present  hypothesis,  is 

V ;:^ +  '/  =  — ■ • 

Now  since  a  >  \      o}''^  <  a, 

/(a^'^}  <  0,    ^>a'^'^  +  (/a-"'^  <  1, 

the  chance  is  less  than  that  given  by  (21). 

The  conclusion  to  be  drawn  from  all  this  is  moral,  highly 
moral !  It  is  unwise  for  the  Player  to  reduce  his  first  stake, 
it  would  be  similarly  unwise  for  him  to  reduce  any  subsequent 
stake.  The  series  of  turns  is  bound  to  run  till  the  one  party 
or  the  other  is  ruined,  hence  we  have  the 

Fundamental  theorem  of  games  of  chance]  The  Plaijers  best 
chance  of  iviniiiiifj  u  stated  stent  at  atb  uv favourable 
game  is  to  stake  the  sum  trhlch  iv'ill  briiifj  that  returri 
in  one  turn.  If  that  be  not  allotued,  he  should  ^Uike  at 
each  turn  the  largest  ainoiDU  that  the  Banker  will 
accept. 
The  practical  gambler  (if  there  be  such  a  person)  will 
probably  reply  to  this : 

'  The  player  who  stakes  his  whole  fortune  on  a  single  turn 
is  a  fool,  and  the  science  of  mathematics  cannot  prove  him  to 
be  anything  else.' 

The  answer  is  immediate  : 

*  The  science  of  mathematics  never  attempts  the  impossible, 
it  merely  shows  that  other  players  are  greater  fools.' 

Let  us  look  into  certain  special  cases  of  (21).    The  Banker's 

chance  is  (a^  +  ^-aV(«''^^- !)•  (22) 

When  A  and  B  are  equal,  i.e.  when  the  Player  undertakes 
to  win  or  lose  a  certain  sum,  his  chance  is 

l/{a^  +  l).  (23) 

Putting  a  =  1  +  6  we  see  that  his  chance  is  less  than 

1/(2  + A  e).  (24) 

When  a  —  b  we  may  take  each  of  them  equal  to  1,  we  have 

then  a  =  q/p.  (25) 
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As  an  example  of  these  principles  let  us  examine  the  game 
of  roulette,  as  played  at  Monte  Carlo.  Our  description  is 
taken  from  Sir  Hiram  Maxim  * 

'  The  roulette  consists  of  a  large  circular  basin,  about  2  feet 
in  diameter,  with  the  outer  rim  turned  inward.  The  bottom 
of  the  basin,  which  forms  the  w^heel,  is  of  metal,  quite  separate 
from  the  rim  or  sides,  and  is  nicely  balanced  on  a  fine  pivot, 
so  that  when  set  in  motion,  it  will  spin  for  a  considerable 
time.  The  outer  edge  of  the  wheel  is  accurately  divided  into 
thirty-seven  sections  or  pockets,  eighteen  of  which  are  painted 
red  and  eighteen  black.  One  is  called  zero  and  is  neutral  in 
colour.     The  other  pockets  are  numbered  from  1  to  36.' 

The  wheel  is  set  in  motion,  and  a  small  ball  started  rolling 
around  the  edge  in  the  opposite  direction.  The  game  consists 
in  bettins:  on  the  colour  or  number  of  the  division  in  which  the 
ball  comes  to  rest.  There  are  fourteen  different  methods  of 
staking ;  the  simplest  are,  red  or  black,  odd  or  even,  above 
or  below  18.  In  each  of  these  cases,  the  Player  and  Banker 
put  up  equal  sums.  If  a  player  stake  upon  a  single  number, 
the  Banker  puts  up  35  times  the  amount.  The  upper  limit 
for  a  stake  on  a  simple  chance,  as  red  or  black,  is,  or  was, 
6,000  francs,  whereas  on  a  single  number  the  limit  was  180 
francs.  When  the  ball  rolls  into  the  zero,  the  player  on 
a  simple  chance  may  either  forfeit  one  half  of  his  stake,  or 
leave  it  '  en  prison '  till  the  next  turn.  If  he  be  fortunate 
in  this  turn,  he  saves  his  stake,  but  gets  nothing  from  the 
Banker  ;  if  he  lose,  the  stake  is  gone  for  ever.  Those  who  bet 
on  individual  numbers  or  combinations,  lose  all  when  the  zero 
appears. 

Let  us  calculate  the  probability  favourable  to  the  Player. 
We  shall  imagine  that  he  is  wise  enough  (and  rich  enough)  to 
stake  6,000  francs  each  time.  We  may  also  disregard  the 
possibility  of  zero  appearing  twice  in  succession,  as  this  will 
certainly  be  very  rare.  Then  in  sets  of  74  turns  each,  the 
average  result  will  be  36  reds,  36  blacks,  1  zero  followed  by 
a  gain,  which  does  not  count,  and  1  zero  followed  by  a  loss. 
Hence  ^,^36.^^  3  .  .  ^  ^  ^/^,  ^  37. 

*  Montt  Carlo  Facts  and  Fancies,  London,  1906,  pp.  257  ff. 
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The  Player's  chance  is 

((ij)''-i)/((H)^^^-i) 

where  A  and  B  are  the  two  fortunes,  reckoned  in  terms  of 
6,000  francs  as  a  unit.     When  they  are  equal,  his  chance  is 

i/((iJ)^+i)- 

Let  us  next  suppose  that  the  Player  stakes  on  a  number. 
Has  he  a  better  or  worse  chance "?  Common  sense  points  to 
the  latter,  as  the  zero  is  now  a  sure  loss,  and  this  anticipation 
is  borne  out  by  calculation.  The  amount  which  may  be 
staked  is  one  thirty-third  of  what  it  was  before,  which 
amounts  to  assuming  that  the  stake  remains,  but  that  the 
fortunes  have  been  multiplied  by  33.  The  Player's  chance 
will  be  what  it  was  previously,  if  the  present  value  of  a  be  the 
thirty-third  root  of  (IJ),  and  will  be  less,  if  it  be  larger  than 
that.    The  thirty-third  root  of  (4|)  is  1-00085  ;  to  find  a  we  put 

a  =  !-}-€,     2^=3V»     7  =  3?» 
(l  +  e)^C-37  (1+^^  +  36  =  0, 

1  -|-36e  +  630e2-f  7140e'^-37-37  6  +  36  =  0, 
1  +e  =  1-00155. 
This  method  is  less  favourable  to  the  Player  than  was  the 
simple  chance. 

Let  us  now  look  at  matters  from  the  Banker's  point  of 
view.  We  have  not  the  data  available  to  do  this  really 
correctly,  but  the  method  of  handling  a  supposititious  case 
will  show  how  a  correct  solution  might  be  obtained.  Let  us 
consider  a  set  of  runs,  each  of  1,200  turns,  and  inquire  into  the 
chance  that  the  Banker  should  come  out  the  loser  in  one  run. 
Let  us,  for  simplicity,  assume  that  there  are  200  players,  each 
stakiuf]^  an  average  sum  which  we  take  as  the  unit.  As 
a  matter  of  observation,  not  all  players  follow  the  same 
system.  Some  bet  on  red  because  they  believe  it  is  '  Red's 
day ',  others  for  precisely  the  same  reason  bet  on  black,  as 
they  think  it  is  time  for  black  to  appear  to  even  things  up. 
Some  prospective  players  will  sit  for  a  long  time  observing 
the  runs,  and  not  betting  at  all,  until  they  have  made  up 
their  minds  what  is  happening,  or  going  to  happen.  These 
patient  watchers  are  quite  as  welcome  as  the  rasher  players, 
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and  quite  as  unlucky.  However,  owing  to  the  variety  of 
motives,  we  shall  come  near  enouirh  to  the  truth  if  we  assume 
that  the  200  players  are  divided  by  lot  into  those  who  back 
red,  and  those  who  back  Idack.  The  game,  therefore,  amounts 
to  this.  AVhen  the  zero  appears,  the  Banker  gets  one  half  of 
the  stakes  of  all  the  players.  When  zero  does  not  turn  up, 
the  reds  pay  tlie  blacks,  or  the  blacks  pay  the  reds,  and  the 
Banker  receives  or  makes  good  the  difierence.  When  a  coin 
is  thrown   200  tiuies   the  average  numerical  discrepancy,  as 

given  by  (16),  is  0-798  ^200  x  J  =  6,  so  that  we  may  assume 
that,  on  the  average,  the  reds  and  blacks  will  otiset  each 
other,  except  for  twelve  players,  with  whom  the  Banker  must 
reckon. 

In  1,100  turns  there  will  be,  on  an  average,  30  zeros. 
When  a  zero  turns  up,  the  Banker  will  collect  a  half  unit 
from  each  player,  the  average  winnings  from  the  zeros  will  be 

4x200x  30  =  3,000. 

To  come  out  behind  in  the  run  of  1,100  turns,  the  Banker 
must  have  an  adverse  discrepancy  of  J  x  3,000  =  500  turns. 
The  chance  for  this  is 

Vv/2x  1,080x1^ 

which  is  so  small  as  to  be  utterly  negligible. 

There  is  one  more  problem  in  ruin  which  is  worth  notice. 
Assuming  that  A  —  ma,  what  is  the  probability  that  the 
Player  will  be  ruined  exactly  on  the  /xth  turn?  This  is  the 
probability,  that  in  the  first  //-I  turns  he  will  win  exactly 
{/i—m)/2  times,  and  lose  exactly  {fjL  +  m  —  2)/2  times,  and 
that  he  thereupon  loses  the  //th  turn.     This  will  be  * 


(/i-D! 


J  ^/v--m)/2^(M  +  m)/2^ 


*  Incorrectly  given  by  Bertrand,  loc.  cit.,  p.  123. 


Problem. 

Work  out  the  theory  of  some  other  game  according  to   these  same 
principles. 
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This  expression  is  only  correct  if  we  assume  that  the 
Banker  is  so  rich  that  there  is  no  possibility  of  his  being 
ruined  in  the  interval.  The  sura  to  infinity  of  expressions 
like  this  would  be  the  probability  of  ruin  for  a  player  pitted 
against  an  adversary  of  infinite  fortune,  but  that  probability 
we  have  already  seen  is  1.  Let  us  rather  seek  for  what  value 
of  /z  this  will  be  a  maximum.  It  is  to  be  noted  that  we  have 
here  a  term  of  the  expression  q  ('p  +  qY~\ 

We  get  a  similar  expression  by  changing  /z  to  /x  +  2,  and 
equating  the  two  we  get  the  rather  clumsy  quadratic  equation 

4/z(/z  +  l) 
^        ^  , X  pq  —  1. 

(/z  +  2-m)(/x  +  mj      ^  ^ 

A   root  of  this  equation   will  give  approximately  the  term 
desired. 


CHAPTER  IV 
MEAN  VALUE  AND  DISPERSION 

§  1.    Elementary  Theorems  in  Mean  Value. 

In  the  course  of  the  last  chapter  we  had  freciuent  occasion 
to  solve  such  problems  as  to  find  the  expectation  of  a  man 
who  is  to  receive  a  sum  equal  to  the  square  of  the  discrepancy 
in  a  certain  series  of  trials.  Ch.  Ill,  l]  showed  us  how  an 
expectation  is  the  limit  of  an  average,  as  the  number  of  trials 
increases  indefinitely.  The  reader  must  have  suspected  that 
this  whole  question  of  averages  and  expectation  was  capable 
of  much  fuller  treatment,  and  that  new  definitions  would 
help  to  clarify  the  whole  matter.  We  now  proceed  to  give 
our  undivided  attention  to  this  task. 

Definition]  If  a  variable  take  the  different  values  V^Y^,.,Vn  ivlth 
the  respective  j^'^obahilities  Pi2h"'Pn>  ^'^^^  these  are  all 
the  possible  values  for  that  variable^  then  the  expression 

i  —  ii 
1  =  1 

is  called  the  mean  value  of  that  variable. 

Definition]  If  Vis  a  function  of  tlie  parameters  X^^X^y...  X„ , 
which  vary  according  to  the  third  empirical  assump- 
tion, then  the  integral 

'vFdX,dX,...dX^ 

extended  over  the  whole  range  of  variation,  giving  to 
the  probability  a  value  other  than  0,  shall  be  defined  as 
the  mean  value  of  the  variable. 
We  reach  at  once  from  Ch.  Ill,  1] 

Theorem  1]  The  mean  value  of  a  variable  is  the  limit  of  its 
average  value  as  the  number  of  trials  increases  indefi- 
nitely. 
Let  us  note  in  passing  that  the  mean  value  of  a  variable 
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is  the  expectation  of  a  man  who  will  receive  a  sum  equal  to 
this  variable. 

In  speaking  of  variables  throughout  the  present  chapter, 
we  shall  mean  such  variables  as  take  mean  values.  We  must 
give  one  important  definition  in  connexion  with  these : 

Definition]  Two  variables  shall  he  said  to  he  independent  if 
the  probahility  that  one  lie  close  to  a  given  value  is 
independent  of  the  value  of  the  other. 

Theorem  2]  The  tnean  value  of  the  sum  of  two  variables  is  the 
sum  of  their  mean  values. 

Let  us  first  suppose  that  each  can  take  only  a  finite  number 
of  values.  Let  the  first  one  take  the  values  x^x^...Xj^  with 
the  respective  probabilities  p^i^^  ...p>ni  while  the  second  takes 
the  values  2/12/2  •••  Vm  ^^^^  ^^^  probabilities  TTjTTg ...  tt^.  Let 
P^j  be  the  probability  that  the  first  variable  takes  the  value 
x^,  while  the  second  takes  the  value  y-.  The  mean  value  of 
the  sum  will  then  be 

=  n,  j  =  m 

i  =  n 

The  total  coefficient  of  2//  is    2  ^H'     This  is  the  sum  of  the 


J  =  1 


mutually  exclusive  probabilities  that  the  first  variable  should 
take  the  values  x^X2...XJ^,  while  the  second  takes  the  value 
yj.  It  is  therefore  ttj.  In  the  same  way  the  total  coefiicient 
of  x^  is  2^i-     The  expression  above  is  thus 


J  =  ni 


2  Pi^i  +  2  ^jV; 

V  =  1  /  =  1 

and  this  is  the  sum  of  the  two  mean  values.  When  one  or 
the  other  variable  can  take  an  infinite  number  of  values  we 
pass  from  a  finite  sum  to  a  definite  integral  by  the  sorb  of 
device  universally  used  in  the  integral  calculus. 

It  is  especially  important  to  note  that  in  this  theorem  there 
is  no  assumption  as  to  the  independence  of  the  variables.  In 
consequence  of  this,  we  can  use  mean  values  in  cases  where 
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the  search  for  the  actual  probabilities  is  beyond  our  power. 
The  next  theorem  has  a  more  restricted  scope  : 

Theorem  3]    The  mean  value  of  the  i^roduct  of  tivo  indepen- 
dent  varuddes  is  the  product  of  their  mean  values. 
Using  the  same  notation  as  before,  since  P^j  is  the  probability 
for  the  simultaneous  arrival  of  the  values  Xi  and  t/-,  we  have 
by  our  definition  above 


4;      Vk       ^ki      ^i 


^kl      PkT^i         ^  ' 


But  1'Pi=l.nj=      2      Pij=h 

i  =  i  i=i  .,;=! 

Mean  value  is 

'. ./ = 1  <■  =  1  ^  - 1 

The   extension  to  the  case  where    the   one   or   the   other 
variable  can  take  an  infinite  number  of  values  is  immediate. 

Theorem  4]  The  mean  value  of  the  square  of  a  variable  is  not 
less  than  the  square  of  its  mean  value. 
Using  our  previous  notation,  we  have 

2  Vi^^  -   2  Pi^i  =  2  Pi^i'  2  Pi  -  2  Pi^i 
i=\         •- .  =  1     -■    . = 1      i  =  \      •-  <  =  1     -■ 

;,  j  =  n 

=  i  2  PiPf(^i-^if> 

>  0. 


Problems. 

1.  Prove  that  the  mean  value  of  the  sum  of  k  variables  is  the  sum  of 
their  mean  values. 

2.  Prove  that  the  mean  value  of  the  product  of  k  independent  variables 
is  the  product  of  their  mean  values. 
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Theorem  5]  The  mean  value  of  the  square  of  the  siivi  of  n 
iadepeadent  variables,  each  <f  whkh  has  the  7)xean  value 
0,  is  the  sum  of  the  inieaa  values  of  their  squares. 

We  see,  in  fact,  that  when  we  square  our  sum,  we  have 
squared  terms  and  product  terms,  and  the  mean  value  ot"  each 
of  these  latter  is  0  by  3].  Let  us  go  a  little  further  in  this 
direction.  Still  assuming  that  the  mean  value  of  each  of  our 
variables  x^j:.^  ...  a;,j  is  0,  let  the  mean  value  of  the  squares  be 
A^A^...An  respectively.     We  may  write 


Jb^  *"" 


and  reach : 

Theorem  6]  Given  n  independent  variables  x^cr^...Xn,  ^cich 
tuith  the  mean  value  0,  while  the  mean  values  of  their 
respective  squares  are  A ^A.^...  An,  then  the  mean  value 
of  the  expression 


91-1 
18  .4^    ^-i- 


y  A 


"     ,  =  1 


Let  us  next  look  at  variables  whose  mean  values  are  not  0. 
Using  the  same  notation  as  before,  let  us  assume  that  the 
n  mean  values  are  a^a^  ...  a^-     Since 

we  liave : 

Theorem  7]   //  the  mean  value  of  a  variable  x^  be  a,;,  while  the 
mean  value  of  its  square  is  A,-,  then  the  mean  value  of 

A.~ai'. 
Theorem  8]  Given  n  inde'pendent  variables  x^x,^...Xji,  ichose 
mean  values  are  a^a^...a„,  tvhile  the  mean  values  of 
their  squares  are  A  ^  A, ^...  A  ^i  respectively,  then  the  mean 
value  of 
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2  (^i-^uf 


=  1 


is  2  (^i-a*")- 


1  =  1 


In  the  further  discussion  of  these  quantities,  let  us  assume 
that  each  can  take  but  a  finite  number  of  values.  For 
instance,  let  ic^  take  the  values  x^-^,  x^^.^.x^^i  -with  the  re- 
spective probabilities  2^n^2h2y  "-Ihn'  Our  theorem  8]  may 
be  expressed  by  the  equation 

=  2  {^i-<h 


i= 1 


»,.i,A-.., 


11 


On  the  loft,  let  us  leave  out  all  t^rms  where 

*    (^1  i  "^  ^\j  "^  ^rt  fc  "^  ♦  •  •      ^^1      ^2  —  <^3  « » • )         - 

and  replace  this  expression  by  1  when  it  is  greater  than  that. 
We  have  thus  a  quantity  distinctly  less  than  t'^/n  which 
represents  the  probability  that  this  expression  should  be 
greater  than  unity.    Taking  the  contrary  probability  we  have : 

TchebychefF's  inequality  *]  Given  n  independent  variables 
x^x^.,.Xn  whose  mean  values  are  a^a.^...an,  while  the 
mean  values  of  their  squares  are  A^Aq...  A^j  respectively, 
then  the  prohahility  that  the  difference  hettueen  the 
average  of  these  quantities ,  and  the  mean  value  of  this 
average,  ivhich  is  the  average  of  their  mean  values,  shall 
differ  from  0  by  a  quantity  numerically  not  grecder 

than-  .  \ ^s  qreater  than  1 • 

t  A/         n  n 

*  Tchebycheff,  (Eumes,  Petrograd,  1899,  vol.  i,  p.  687. 
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In  applying  this  inequality,  we  note  that  the  expression 


^(^i-O 


n 

will  vary  with  n  only  between  fixed  limits  ;  we  may  therefore 
take  t  so  large  that  the  expression 


1      l^iAj-g:') 


is  as  small  as  we  please.  Then  we  may  take  n  so  large  that 
the  probability  is  as  near  to  unity  as  we  wish.  In  other 
words,  the  inequality  tells  us  that  by  taking  enough  variables, 
we  are  almost  certain  that  the  difference  between  the  averaire 
and  the  mean  value  of  this  average,  which  is  the  average  of 
the  mean  values,  shall  be  extremely  small.  The  simplest  case 
is  where  all  the  mean  values  are  the  same,  and  the  inequality 
tells  us  that  there  is  a  very  large  probability  that  the  difference 
between  the  observed  average  and  the  mean  value  shall  be 
very  small,  which  is,  after  all,  a  restatement  of  Ch.  Ill  l].  It 
also  leads  to : 

Poisson's  Law  of  Large  Numbers*]  //  an  event  be  tried 
rejjeatedly  with  the  prohahilities  2h2^'>"'  /^^'  sviccess, 
which  may  be  constant,  or  may  vary  ivlth  each  trial, 
then  if  the  miinber  of  trials  increase  indefinitely,  the 
irrobabilUy  that  the  difference  between  the  average  'prob- 
ability and  the  observed  ratio  of  success  vjill  differ  by 
less  than  any  assigned  quantity  approaches  \  asa  lionit. 

Let  each  variable  take  the  value  1  when  the  event  succeeds, 
0  when  it  fails.  The  mean  value  of  the  ith.  variable  is  thus 
2^1.  Tchebychefi"'s  inequality  tells  us  that  we  have  a  probability 
above  1  —  tyn  that 


(average  number  of  successes —j  <   .  /d 


^Pi\  .  1     \^Vi      '^^Vi 


n  n 


1     l^Pi'h 

=  't^-ir' 

*  Poisson,  '  Reclierches  sur  la  probabilite  des  jugements ',  Comptes  Rendus  de 
VAcademie  des  Sciences,  vol.  ii,  1835.    Bertrand  (loc.  cit.,  p.  xxxii)  comments  as 

2686  Y 


66  MEAN    VALUE    AND    DISPERSION 

Hence  we  have  a  probability    >  1  ^t'^/n  that 

(average  number  of  successes  —  JS'p^Ai)  <  1/2  f. 
No  matter  how  small  \/2t  may  be,  we  may  make  tyn  as 
small  as  wc  please. 

§  2.    Dispersion.* 

Suppose  that  we  have  n  measurements  of  the  same  object, 
or  different  objects  iJiy., ...  2/mj  where 

('  =  n 

2  2/t 
/  =  1 

7/=: 


n 


Then  the  expression 


2  (Vi-yf 

i  =  1 


n 


(1) 


is  called  their  d'ispersio]i  or  standard  deviation.  Let  us  find 
its  mean  value.  We  shall  use  the  previous  notation  for  the 
mean  value  of  one  of  our  variables  and  for  the  mean  value 
of  its  square,  and  write  also 

i  =  n 

2«i 


a  =^ 


=  1 


n 
The  square  of  the  dispersion  is 


N^ 


(Vi-yf  =-  2  {((2/i-«i) -(2/ -«))+[«-«]'.'• 


n  —    "  )i 

<  =  1  /  =  1 

The  mean  value  of  each  large  round  bracket  is  0,  as  is  the 
mean  value  of  the  product  of  a  large  round  bracket  and 
a  square  bracket.  The  mean  value  of  the  square  of  a  square 
bracket  is  its  ostensible  value.     When  it  comes  to   finding 

follows  ;  *  Tel  est  le  resume  fait  par  Poisson  lui-meme  d'une  decouverte  qui 
se  distingue  bien  peu  des  lois  connues  du  hasard,  et  a  laquelle  il  a,  a  peu 
pres  seul,  je  crois,  attache  une  grande  importance.' 

*  The  first  part  of  the  present  section  will  be  found  in  an  article  by  the 
author,  *  On  the  Dispersion  of  Observations  ',  Bulletin  American  Math.  Soc, 
vol.  xxvii,  1921. 
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the  mean  value  of  the  square  of  the  large  round  bracket,  we 
may  apply  6]  and  7].     This  brings  us  to  the 

Fundamental  Dispersion  Theorem]  If  n  independent  quanti- 
ties he  given yiy^-'-Vn^ "fvhose  mean  values  are a^ a^... a,;, 
tvJiile  tlie  mean  values  of  their  squares  are  A^A,^...An 
respectively,  and  if  the  average  of  the  quantities  he  y, 
while  the  average  of  the  mean  values  is  a,  then  the  mean 
value  of  the  square  of  the  dispersion  is 

t  =  1 

In  practice  we  make  two  approximations.  Firstly,  when  n 
is  reasonably  large  we  replace  (n—l)/n  by  1  ;  secondly,  in 
accordance  with  Tchebychetf's  inequality,  since  the  square  of 
the  dispersion  is  an  average,  we  replace  its  mean  value  by  the 
observed  value,  thus  getting  the  fundamental  dispersion 
equation 


i  =  1  *-  (  =  1  ?  =  1  -' 


(2) 


The  reader  will  not  forget  that  this  equation  is  merely  an 
approximation.  No  equation  connecting  observed  quantities 
with  mean  values  can  be  exact.  Let  us  make  some  applica- 
tions of  this.  Suppose  that  we  have  N  sets,  each  of  6 
observations 


J  =  « 

^11^12  •■ 

'•^isj 

2  ^'li 

=  ^'i, 

^21^22  • 

'  •  '^2  S  ' 

=  X2  J 

•               •               • 

.        •        • 

^iVi^iV2  • 

•  •  ^.V^-  5 

2   ^.Ny 

=  ^v 

Let  ay  be  the  mean  value  of  x^j,  while  Ay  is  the  mean 
value  of  its  square. 

F  2 
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,;■  =  .<!  /  =  A'  i  =  y 

Let       2  ^ij  —  ^U '"      2  ^i  =  -^^  ;       ^  ai  =  Na; 
J  =  1  ?■  =  1  *  =  1 

2"  (-,  -  ?y  =2  (4,-a,/)  +  2  («,  -  ^o'- 

j  =  1  j  =  1  j  =  1 

Summing  again : 

i  -  N,  j  =  s  2  '  ~  ^^''  •'  ~  *  i  =  N,j  =  s  2 

2  (-,-?)  =    2   (%-V)+    2   ("<.-?)■ 

j  =  s 

Again  ^{  -  «t  =  2  i^ij  -  (^ij)  ■ 

.;  =  1  ^ 

Hence  by  5]  and  8]  the  mean  value  of  (x^  —  ai)^  is 

j-s 

Mean  value  of  X{^  =  mean  value  of  (a?i  —  a^)^  +  a,-^  since  a^ 
is  mean  value  of  aj^. 
Applying  (2)  again : 

i  =  N  i  =  A',  j  =  J<  i  =  N 

1  =  1  ?,./■  =  1  i  =  1 

Eliminating  (A^i  —  a^j'^)  between  this  equation  and  the  last 
one  which  contained  it,  we  have 

i  =  A\  j  =  s  _  ..  o— 

i  =  A^ 

=  2  [(«'<-#-(«<-»)']•        (3) 

i  =  1 

In  practice  we  recognize  three  types  of  groups  of  observations. 

A.  Bernoulli  series.  All  of  the  observations  are  supposed 
to  bear  upon  the  same  quantity,  or,  at  least,  the  mean  values 
are  all  the  same.  The  differences  of  observed  values  would 
thus  be  purely  accidental.     Here 

ay  =  ai/s,     tti  =  a, 

i  ■-■■  N,  j  =n  2  i  =  ^ 

7,  j  =  1  a  =  1 

Such  a  series  is  said  to  have  normal  dispersion. 
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B.  Lexis  series.  All  observations  in  the  same  set  aie 
supposed  to  be  on  the  same  quantity,  but  the  quantity  varies 
from  one  set  to  another. 

i  =  N,  j  =  x  2  '  =  ^' 

■i,  j  =  1  2  =  1 

This  series  is  said  to  have  supernormal  dis'persion, 

C.  Poisson  series.  Here  we  suppose  that  within  a  set  there 
is  some  difference  among  the  objects,  but  that  all  sets  are 
comparable  « .^.  ^  ^./,,^     ^ .  ^  ,,^ 

This  series  is  said  to  have  suhiiorinal  dis'persion. 

What  we  can  do  in  practice  is  this.  We  calculate  the  two 
quantities      i  =  ,v,,  =  ,  ^  i  =  ^v 

2     (^v-7)  ^^^  2  {^i-^?- 

i, ;  =  1  i  =  1 

If  they  be  virtually  equal,  we  are  sure  that  the  members  of 
one  set  cannot  all  be  the  same,  unless  all  the  sets  are  the 
same,  and  vice  versa.  If  the  first  be  less  than  the  second, 
the  different  sets  cannot  be  all  the  same.  If  the  first  be 
greater  than  the  secon<l,  there  must  be  a  variation  within 
a  set. 

As  an  example,  we  give  the  observations  for  precipitation 
in  inches  by  month  in  New  York  City — 


1^ 

-0 

< 

>> 

1^ 

•-5 

do 

3 

•4^ 

0 

> 

0 

6 

H 

01 

4-18 

5.16 

3.18 

2.06 

4.08 

3-36 

4.33 

2-09 

2-36 

4.17 

4.26 

1.98 

41.78 

3- -J  8 

207 

0.86 

.5.18 

6.82 

7.01 

0.94 

5.41 

6.88 

2.33 

2.20 

1-31 

6.05 

47.06 

3.92 

2.28 

5.78 

4.32 

3.51 

1.-23 

5.91 

3.12 

3.29 

3.59 

6.66 

1.19 

619 

4707 

3.92 

3-44 

3.83 

3.65 

2.88 

0.33 

7.42 

3.33 

5.96 

2- GO 

1-55 

0.90 

2.81 

48-60 

4.05 

3.38 

2.18 

3.44 

3.94 

1.61 

2.70 

4.31 

7.13 

3.18 

3-21 

2.62 

3.87 

41.57 

3.46 

3.93 

2.79 

3.65 

245 

1.12 

4.18 

601 

5.23 

711 

2.67 

1-67 

3-67 

44-48 

3.71 

2.98 

2.57 

5.58 

5.78 

4-67 

1.70 

3.21 

3.68 

2.54 

4-30 

1.28 

3.53 

41-82 

3-49 

3.26 

1.52 

3.8O 

3.89 

4.08 

3.29 

1.18 

2.48 

8.00 

382 

5.05 

3.91 

45.28 

3-77 

3.84 

5.36 

2-15 

1.82 

9.10 

1.70 

4.33 

5.65 

1.60 

1-92 

0-75 

3.21 

41.43 

3.45 

3.33 

4.31 

3.19 

5.93 

1.72 

3.17 

1.98 

7.94 

2.66 

0.74 

1.58 

5.00 

'41.55 

3-46 
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i  =  10.  j  =  12  2 

i  =  10 

2  {Xi-xf=  69-47. 

i  =  l 

This  has  the  characteristics  of  a  Poisson  series,  and  we 
conclude  that  the  rainfall  in  New  York  shows  a  greater 
tendency  to  vary  month  by  month  than  year  by  year, 
a  rather  natural  result. 

The  most  frequent  applications  of  these  tests  are  to  the 
observations  of  probabilities  or  frequency  ratios,  to  see  whether 
they  vary  from  case  to  case  or  from  set  to  set.*  Let  the 
generic  letter  for  one  of  our  probabilities  be  pi:  and  let  this 
represent  the  probability  that  x^-  takes  the  value  1,  while  in 
the  contrary  case  it  takes  the  value  0.  Then  a^/s  is  the 
average  probability  for  the  ith  set,  and  we  may  put 

^h  =  ^i  =  2  Pij ;  ^7^  =  2  ^^t  =  ^  • 

j  =  \  i  =  \ 

By  the  equation  preceding  (3) 

1  =  1  i,j  =1  i  =  1 

2  [Pij  -PiV  =  2  Pi/ - ^Pi' ;  2  Pi/  =  2  [pij  ~PiY + 'p'- 

;  =  1  j  =  1  ^  =  1  ./  =  1 

<■  =  iV  i  —  N 


Similarly  2  J^/ =    1,  iPi-Pf  +  ^y, 

i  =  l  a  =  1 

;  =  1  i  =  i^  .7  =  1 

=  Ns2}-Ns2^^  -      2     O^ij-i^i)' +  («'-«)  2  (7\--7^)'; 
;,  j  =  1  i  =  1 

*  See  Fisher,  loc.  cit,  pp.  117  ff.,  also  Forsytli,  'Simple  Derivation  for  the 
Formulas  for  the  dispersion  of  Statistical  Series  ',  American  Math.  Monthly, 
vol.  xxxi,  1924. 
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i  =  N,J=  s  2  <  =  -iV 

i,J=l  i  =  1 

1    '"'"' 

Bernoulli  series :  p^j  =  _2J>t-  =  ^^ ;  vr  2  K*-^')^  =  s^^^- 
Lexis  series  :  p^j  =  p.j^.  ^  p ; 

i  =  1  ;  =  1 

Poisson  series  :  2^ij  ^  Pi,Pi  =  P ; 

jr  2  i^i  -  ^>)'  =  «M  -  ;^       2      (/>v  -^-^)'- 

i  =  1  /,  ./•  =  1 

^  There  are  cases  where  a  study  of  the  mean  value  of  the 
squared  dispersion  or  discrepancy  brings  out  the  ditlerences 
between  two  series  of  trials  which  are  otherwise  seemingly 
alike.  Let  us  return  to  our  problem  of  repeated  trials,  so 
thoroughly  discussed  in  the  last  chapter.  Let  us  first  have 
7ij  trials,  with  a  constant  probability  p^  of  success,  then  n.^ 
trials,  with  a  probability  2\i  &c.  The  mean  value  of  the 
number  of  successes  will  be     y  ^  ^^ 

i 
The  total  discrepancy  will  be 

Since  the  mean  value  of  the  product  of  an}"  two  of  the  brack- 
eted expressions  is  0,  the  mean  value  of  the  square  of  this 
discrepancy  is  thesun»  of  the  mean  values  of  the  squares  of  tlie 

discrepancies  of  the  individual  series,  i.e.  ^ n^p^  —  ^it^pf, 

it 

Suppose,  secondly,  we  take  oi  =  ^  n^  trials  of  an  event, 
where  the  probability  for  success  is 

^  ^HPi 

i 


Problem. 

Work  out  another  set  of  observations  according  to  this  same  plan. 
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The   mean    value   for  the   number   of  successes   will    be   as 
before.     The  mean  value  for  the  square  of  the  discrepancy  is 

npil-p)  =  ^ ^^ 

=  ^-^KPi-^n^'Pi-  +  ' — ' — j:^ 

i 

We  see  that  in  the  second  case  the  mean  value  for  the 
number  of  successes  is  the  same,  but  that  for  the  squared 
discrepancy  is  greater. 

fl  Here  is  an  even  more  instructive  example  of  the  same 
kind.*  The  problem  of  repeated  trials  may  be  stated  in  the 
following  way.  An  urn  contains  a  large  number  N  of  balls, 
of  which  Np  are  white  and  Nq  are  black.  A  ball  is  taken 
out  and  rejjlaced  n  times  in  succession,  what  is  the  probability 
for  seeing  just  r  white  and  n  —  r  black  balls'?  This  problem 
we  have  solved  completel3^  We  now  take  up  the  analogous 
problem  where  the  ball  extracted  is  not  replaced.  The 
probability  for  just  r  whites  and  n  —  r  blacks  is  now 

{N'i>)\         ^  jNqY ^         N\ 

r\{N'p-r)\  '  {u-ry\Nq~-{n-r)\\  '   n\{N-n)\' 

This  is  a  maximum  with 

1  _1 

r!  {N'i)-r)\  '  {n  -  r)\  [Nq  -  {n  -  r)\\' 

The  ratio  of  this  to  the  next  term  is 

r+l        N  {\—]))-\-{n-r) 
Np—r  n—r 

which  is  very  close  to 

r  Nq  —  (n  —  r) 

Np  —  r  n  —  r       ' 

and  this  will  be  1  when       r  =  wp. 

The  most  likely  number  of  white  balls  will  be  as  before. 

*  Castelnuovo,  loc.  cit.,  p.  41. 
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Let  us  find  the  mean  number  of  white  balls.  This  is  the 
expectation  of  a  man  who  shall  receive  one  pistole  for  every 
white  ball  that  appears,  and  nothing  for  a  black  one,  and 
this  is  the  sum  of  his  expectations  from  the  individual  balls 
drawn,  and  so  is  p  +  ^J  +  p  -H  . . .  =  np,  and  tbis  is  just  the  mean 
value  for  the  number  of  success  that  we  got  before.  Now  let 
us  find  the  mean  value  for  the  square  of  the  discrepancy. 
X^  take  the  value  1  if  the  ith  ball  be  white,  0  if  it  be  black. 
Then  the  mean  value  of  ^X^  is  the  mean  value  just  found. 

Furthermore,  let  Y^  =  X^  —  'p.  We  wish  to  find  the  mean 
value  of  (JF,;)'^. 

We  have  the  following  table  of  values : 
^^i  —  9  probability     p 

7/  =  q^  p 

F/  =  p^  q 


YiY^  =  p'  q 


,,  .-r  Np-Nq         2Npq 

'J  ^^  iN{X-l)       N-l 

Hence  the  mean  value  of  ^  Y^^  is 

i 

n  (n—l)  pq 

Comparing  this  experiment  with  that  where  the  balls  are 
not  replaced,  we  see  that  the  most  likely  number  of  white 
balls,  and  the  mean  number,  are  the  same,  but  the  mean 
value  of  the  square  of  the  discrepancy  is  decreased,  and  we 
should  expect  to  see  less  dispersion. 


CHAPTER  V 


GEOMETRICAL  PROBABILITY 

In  the  third  empirical  assumption  of  the  first  chapter,  we 
assumed  that  when  an  event  depended  upon  n  independent 
variables,  varying  in  an  -n-dimensional  continuum,  there 
existed  such  an  analytic  function  F  that  the  probability  that 
the  variables  should  take  values  lying  in  an  ^i-dimensional 
sub-manifold  was  expressed  by  the  integral 


FdX^dX^...dX, 


extended  over  that  manifold.  By  a  proper  chauge  of  variables 
we  then  saw  that  this  probability  might  be  expressed  by 
the  ratio  p-      p 

■ (1) 


.t 


dx^dx.j, ...  dxJ^ 


where  the  integration  in  the  numerator  is  over  the  correspond- 
ing sub-manifold  for  the  new  variables,  while  that  in  the 
denominator  is  over  the  total  field  of  possible  variation. 

The  great  difficulty  in  handling  problems  in  this  continuous 
or  geometrical  probability  consists  in  determining  which 
variables  to  take  in  order  to  express  the  probability  in  the 
form  (1).  This  difficulty  can  be  brought  out  most  clearly  by 
one  or  two  specific  examples.  Here  is  a  variation  on  one 
that  appeared  in  the  first  chapter.  Suppose  that  a  number  is 
chosen  at  random  between  1  and  3,  what  is  the  probability 
that  it  lies  between  1  and  2  1  The  natural  mode  of  procedure 
is  as  follows.  All  regions  of  the  interval  being  supposed 
equally  plausible,  we  take  the  number  itself  as  the  independent 
variable,  which  amounts  to  assuming  that  tl»e  probability 
that  it  lies  within  an  interval  is  proportional  to  the  length 
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of  that   interval.      Thus,   for    our    particular    problem,    the 

probability  sought  is  ^ 

dx 
1 £. 

dx 
1 

LeaviniT  this  answer  for  a  moment,  let  us  next  assume  that 
a  number  is  chosen  at  random  between  1  and  ^,  what  is  the 
probability  that  it  lies  between  1  and  ^1  Following  the 
same  reasoning  as  before,  we  have 

'1 
dx      .     ^       3 


9/    s 


'i  ,  2/3        4 

dx 

1 

3 

But  we  must  now  notice  that  if  a  number  lie  between  1 
and  ■§,  its  reciprocal  lies  between  1  and  3,  whereas  if  it  lie 
between  1  and  ^,  its  reciprocal  lies  between  1  and  2,  and 
the  question  arises,  have  we  not  found  two  incompatible 
answers  to  the  same  problem  ? 

A  neater  paradox  of  the  same  sort  is  due  to  Bertrand.* 

Example  1]  -4  chord  is  drawn  at  random  across  a  circle :  what 
is  the  probability  that  it  is  at  least  as  long  as  a  radius  ? 

First  reasoning.     The  direction  of  the  chord  is  obviously 

immaterial,  as  the  circle  lies  symmetrically  about  the  centre. 

All  depends  upon  the  distance  of  the  chord  from  the  centre  of 

the  circle.     As  we  have  nothing  to  guide  us  here,  we  assume 

that  all  such  distances,  not  greater  than  a  radius,  are  equally 

likely.     The  chord  will  be  as  large  as  a  radius  if  this  distance 

VS  .  -/  3 

be   ^  — -  r.    Our  probability  is,  then,  -—  =  0'86G  +  . 

Second  reasoning.  The  position  of  the  first  intersection 
with  the  circle  is  immaterial,  owing  to  this  same  symmetry, 
all  depends  upon  the  second  intersection.  All  positions  for 
this  second  iutersection  being  equally  likely,  all  angles  between 
the  chord  and  the  radius  are  equally  likely.     The  chord  will 

♦  Bertrand,  loc.  cit.,  p.  4, 
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be  as  great  as  a  radius  if  this  angle  be  not  over  60°,  and  the 
probability  is  |  ^  0-66G  +  . 

Which  answer  is  right?  Neither,  in  an  absolute  sense. 
It  would  be  easy  to  try  the  matter  out  experimentally  in 
such  a  way  that  the  frequency  approached  the  one  or  the 
other  as  a  limit.  If  a  disk  were  cut  out  of  cardboard,  and 
were  thrown  at  random  on  a  table  ruled  with  parallel  lines 
a  diameter  apart,  then  one  and  only  one  of  these  lines  would 
cross  the  disk.  All  distances  from  the  centre  would  be 
equally  likely,  and  we  should  havo  a  ratio  approaching  the 
first  answer.  On  the  other  hand,  if  the  disk  were  held  by 
a  pivot  through  a  point  on  its  edge,  which  point  lay  upon 
a  certain  straight  line,  and  were  then  spun  with  a  random 
velocity  about  the  pivot,  the  frequency  ratio  would  approach 
the  second  value.  The  best  that  we  can  ever  do  in  almost 
any  case  is  to  make  the  best  guess  as  to  the  proper  independent 
variable  which  our  common  sense  can  suggest,  and  calculate 
a  tentative  answer  therefrom. 

^  It  is  fair  to  say  in  this  connexion  that  there  are  exceptional 
problems  where  the  answer  is  independent  of  the  choice  of 
the  independent  variables.  The  following  one  is  due  to  the 
genius  of  Poincare.*  A  wheel  turning  freely  about  a  fixed 
horizontal  axis  is  divided  into  a  large  even  number  of  equal 
divisions,  painted  alternately  red  and  black.  The  wheel  is 
set  spinning.  What  is  the  probability  that  when  it  comes  to 
rest,  a  fixed  point  near  the  periphery  will  be  opposite  a  red 
sector?  The  result,  red  or  black,  will  depend  upon  the  total 
angle  6  of  spin  after  a  marked  point  on  the  wheel  has  passed 
the  fixed  point  for  the  first  time.  Let  /  (^)  dO  be  the  probability 
that  this  angle  shall  be  in  the  interval  6  +  \d6.  This  function 
is  strictly  unknown,  but  we  may  assume  that  it  is  continuous, 
with  a  continuous  first  derivative,  and  that  the  value  of  this 
latter  is  always  numerically  ^  M.  We  take  6  as  an  abscissa, 
and  plot  the  curve  y  =f{0). 

An  infinite  value  of  6  being  impossible,  let  us  suppose  that 
the  whole  region  of  variation  for  6  runs  from  0  to  ne=l, 
where  e  is  the  size  of  one  angular  division  of  the  wheel.     Let 

♦  Poincaro,  loc.  cit.,  p.  127. 
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us  show  that  very  nearly  one  half  the  area  under  the  curve  is 
under  those  regions,  shaded  in  the  accompanying  figure,  which 
correspond  to  the  red  sectors.  If  M-^  and  M^  be  the  maximum 
and  minimum  values  for /at  points  of  two  adjacent  divisions, 
then  the  difference  in  area  between  the  two  cannot  exceed 

2e{M,-3Q, 

We  next  note  that,  by  the  law  of  the  mean,  M^  —  M.^  is  equal 
to  the  difference  between  the  corresponding  abscissas,  multi- 
plied by  the  numerical  value  of  the  slope  of  the  tangent  at 
some  intermediate  point,  i.e.  (M^  —  M^)  S2eM. 

The  difference  between  succeeding  areas  is  thus  <  4  e^if,  and 


Fig.  1. 


the  total  difference  between  shaded  and  unshaded  areas,  i.  e. 
the  total  difference  between  the  probability  for  ending 
opposite  a  red  or  a  black  sector,  is 

<2X4e'-i/ 

=  2dM. 

As  e  decreases  indefinitely,  I  is  constant,  as  is  3/,  our  theorem 
is  thus  proved. 

We  have  thus  seen  that  the  answers  given  to  problems  in 
geometrical  probability  are  subject  to  considerable  suspicion, 
still  it  is  certainly  true  that  there  are  quite  a  number  of  cases 
where  the  choice  of  the  independent  variable  is  clearly  dictated 
by  the  circumstances,  and  where,  as  a  matter  of  fact,  the 
results  are  found  to  check  up  well  in  practice.     Such  problems 


78  GEOMETRICAL   PROBABILITY 

are  also  valuable  as  exercises  in  the  integral  calculus,  and 
are,  consequentl}^  popular  in  text-books  upon  that  subject. 
We  shall  give  a  number  of  the  most  entertaining.* 

Example  2]  A  line  of  given  length  is  divided  into  three  2^<^Tts  : 
what  is  the  probability  that  these  can  be  j^ut  together  to 
form  a  triangle  ? 

Let  X  be  the  abscissa  of  the  point  which  is  marked  first,  x' 
of  that  which  is  marked  second.  If  the  point  marked  first  be 
to  the  left  of  that  marked  second,  of  which  the  probability  is  ^, 
its  abscissa  must  lie  between  0  and  Z/2,  where  I  is  the  length  of 
the  line.  The  abscissa  of  the  second  point  must  then  lie 
between  x  and  1/2 +  x.     The  probability  for  this  is 

I      n  1/2  nx  +  l/'i  I 

—  \        dx 
21'}  0 


dx^  —  - 

X  8 


There  is  an  equal  probability  when  the  point  marked  first 
is  to  the  right  of  the  other,  hence  the  total  probability  is  J. 

Here  is  another  solution  which  is  simple  and  amusing.    Let 

the  length  of  the  line  be  1,  and  the  lengths  of  the  parts  be 

X,  y,  and  Zy 

x  +  y-\-z  =  1, 

y  +  z>x,  z  +  x  >  y,  x-\-y  >  z. 
We  may  take  x,  y,  and  z  as  the  distances  of  a  point  from 
the  sides  of  an  equilateral  triangle  whose  sides  have  the 
lengths  2/\/3,  the  point  being  within  the  triangle.  The  three 
inequalities  will  prevent  the  point  from  being  further  from 
any  one  side  than  one  half  the  length  of  the  median  thereon. 
It  must  therefore  lie  within  the  similar  triangle  whose  ver- 
tices are  the  middle  points  of  the  given  sides,  and  this  smaller 
triangle  has  one-fourth  of  the  area  of  the  larger  one. 

*  The  best  collection,  from  which  the  following  problems  are  taken,  is 
Czuber's  Geometrische  Wahrscheinlichkeiten,  Leipzig,  1884. 


Problems. 

1.  Three  lengths  are  taken  at  random,  not  greater  than  three  given 
lengths  a,  b,  and  c  :  what  is  the  probability  that  they  can  be  combined  to 
form  a  triangle  ? 

2.  Two  points  are  taken  at  random  on  a  line  segment  of  length  a  :  what 
is  the  probability  that  their  distance  shall  not  exceed  6  ? 
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^  Example  3]  Given  the  quadratic  equation 

x^  +  22^x  +  q  =  0,   -P<iy^P,    -Q<q^Q 
what  is  the  prohahility  that  the  roots  are  real'} 
Let  us  take  q  and  p  as  abscissa  and  ordinate  of  a  point  in 
the   plane.     The   total   region    is   the   rectangle    whose   four 
corners  are  the  points  (  +  P,  ±Q))  the  area  being  4PQ.     The 
favourable  region  is  not  within  the  parabola  p^  =  q. 

0.P  (Q'P) 


o,o': 


(QrP) 


P2>Q 


(OrP) 


Fig.  2. 


QrP 


P2  <Q 


In  the  first  case  where  P"^  >  Q,  the  chance  for  imaginary 
roots  is  2 


4Py 
Favourable  chance  is 


'c     _iQ-  _  1  v;q 


1  VQ       2 
3    P    ^  3 


In  the  second  case  where  P^  <  Q.   the  favourable  chance  is 


rp 


2  "^  4PQ 


1 


_.^'^^=2  +  6Q^5 


Example  4]  Tzvo points  are  taken  at  random  within  a  circle: 
vjhat  is  the  probability  that  the  circle  through  them  and 
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the  centre  of  the  given  circle  does  not  go  outside  of  the 
latter  ? 

The  radius  of  the  circle  being  unity,  the  probability  that 
a  given  point  shall  lie  at  a  distance  from  the  centre  between 
X  and  x  +  dx  is  not  dx/l,  as  one  might  hastily  assume,  but  is 
the  ratio  of  the  area  of  the  ring  containing  all  such  points  to 
the  area  of  the  circle,  namely 

{\/7r)2  nxdc  =  2xdx. 
If  the  circle  through  P,  Q,  and  0,  the  given  centre,  do  not 
go  outside  the  given  circle,  the  point  Q  must  lie  within  one 
of  the  two  circles  of  radius,  ^  passing  through  0  and  P.     The 
distance  between  the  centres  of  these  is 


Ki-i7  =  (-^)^- 


The   common   chord  subtends  at  the  centre   of  each   angle 
2  sin"i  X,  the  area  of  the  favourable  region  for  Q  is 


L2.-2sin-x](^y+.(>-J)^ 


=  -  (tt  — sin"^a;  +  a;[l  —x^]^)  • 

Hence  the  probability  sought  is 

1  fi  1  7 

"      ([tt  — sin"^a;lfl3  +  a;2[l  —x'^y)dx  =  -— • 

We  now  come  to  the  most  famous  of  all  problems  in 
geometrical  probability. 

Buffon's  needle  problem.* 

A  smooth  table  is  ruled  with  parallel  lines  separated  by 
a  distance  d.  A  needle  luhose  length  is  I,  less  than  dy  is 
thrown  at  random  on  the  table.  What  is  the  probability 
that  it  will  cross  one  of  the  parallels  ? 

The  chance  that  the  distance  fi*om  the  centre  of  the  needle 
to  the  nearest  parallel  should  lie  between  the  limits  x  and 
x  +  dx  is  2dx/d. 

*  BufiFon,  'Essai  d'arithm6tique  morale'.     See  his  (Euvres,  ed.   1801  (An. 
VIII),  vol.  xxi,  pp.  163  flf. 
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The  chance  that  in  this  case   the  needle  should  cross  the 
nearest  parallel  is  2  2x 

-  COS"^    -y-  • 
TT  0 

Hence  the  probability  required  is 
I. 

cos~^  —r  dx  =  — , 
0  ^  'nd. 


4 
Trd 


'1  21 

cos-^ydy=  --'  (2) 

n  TTCt 


Another  simple  and  ingenious  solution  was  found  by 
Barbier.*  The  probability  of  crossing  a  line  is  the  expecta- 
tion of  a  man  who  shall  receive  one  pistole  if  the  needle  cross, 
and  none  if  it  do  not.  This  is  the  sum  of  the  expectations 
from  the  various  infinitesimal  segments  of  the  needle,  and  will 
not  be  altered  if  the  latter  be  bent  in  any  way.  Assuming, 
then,  that  the  needle  is  made  of  such  inferior  steel  that  it  can 
be  bent  into  the  foim  of  a  circle,  of  diameter  1/tt,  the  prob- 
ability that  the  circle  shall  cross  one  of  our  lines  is  l/ird ;  but 
the  expectation  is  double  that,  for  if  it  cross  once  it  will  cross 
twice.     This  gives  the  same  answer  as  before. 

BufFon's  needle  problem  has  induced  a  number  of  persons 
to  try  the  experiment  of  calculating  tt  experimentally  in  this 
way.  The  most  elaborate  series  of  experiments  was  carried 
out  in  the  year  1901  by  Lazzerini,t  who  made  3,408  trials  and 
got  the  value  77  =  3-1415929,  an  error  of  0-0000003. 

Let  us  pause  for  a  moment  to  discuss  this  result.  The 
natural  method  in  such  cases  is  to  treat  the  problem  as  one  in 
relative  discrepancy,  and  find  the  probability  thut  the  latter 
should    bo   within   assiorned   limits.      But   heic    the   relative 

o 

*  Barbier,  Liouvillcs  Journal,  Scries  2,  vol.  v,  1860,  pp.  273  ff.,  contains 
a  number  of  interesting  )»roblems. 

t  Lazzerini,  'Una  applioa/.ione  t1(>l  calcolo  dclle  Probabilitiv ',  Pciiodico  di 
Malcmatico,  (2)  vol.  iv,  I'JOl,  p]>.  UOll'. 

Problems. 

1.  The  points  P  and  Q  aio  taken  at  random  in  a  circle.  Wliat  is  the 
probability  that  the  circle  with  P  as  centre  and  radius  PQ  will  lie  inside? 

2.  Two  points  are  taken  at  random  within  a  circle.  What  is  the 
probability  that  the  perpendicular  from  the  centre  on  their  line  does 
not  pass  between  them? 

3.  Do  Buffon's  needle  problem  when  the  length  of  the  needle  is  greater 
than  the  distance  between  the  parallels. 

2686  G 
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discrepancy  is  so  small  that  the  discrepancy  in  the  number 
of  crossings  would  be  less  than  unity,  and  the  safest  plan 
is  to  assume  that  there  was  no  absolute  discrepancy  at  all. 
The  probability  for  that,  according  to  Ch.  Ill  (9),  is 

We  have  no  information  as  to  the  relative  lengths  of  I 
and  d,  but  probably  shall  make  no  great  error  in  the  final 
conclusion  if  we  make  the  simple  assumption 

2l  =  d;     1-)=  I/tt. 
The  probability  for  finding  no  discrepancy  will  then  be 
1 1 


J^^<~) 


It  is  much  to  be  feared  that  in  performing  this  experiment 
Lazzerini  '  watched  his  step  '. 

Barbier's  method  of  solving  Buffon's  needle  problem  is 
easily  extended  to  other  cases,  and  gives  an  easy  solution  of 
the  more  difficult  problem  of  finding  the  probability  that 
a  line  shall  cross  a  closed  convex  contour  or  oval.  We  shall 
imagine  that  the  experiment  is  carried  out  in  such  a  way 
that  we  are  justified  in  taking  as  independent  variables  the 
distance  of  the  line  from  a  fixed  point,  and  its  angle  with 
a  fixed  direction.  If  these  numbers  be  'p  and  6,  and  if  we 
slide  the  origin  to  the  point  Xq^j^  and  swing  through  the 
angle  0, 

/  =  a;oCos(^-0)+2/osin(^-^)+^:>,  0'  =  6-(p, 
-^^=1,    ^-  =  -iroSin((9-0)-f-2/oCOs(^-0), 

^{Op} 


Problem. 

Captain  Fox  (Messenger  of  Mathematics,  vol.  ii,  pp.  113,  114)  made  1,120 
trials  of  Buffon's  needle  problem  with  the  resulting  value  of  n,  3.1419. 
Discuss  this  result. 
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Since  this  Jacobian  is  equal  to  1,  the  probabilities  for  an 
arbitrary  line  are  independent  of  the  point  and  direction  of 
reference  for  tlje  normal  coordinates,  an  important  point 
easily  overlooked.  The  probability  that  a  line  shall  pass 
between  two  given  points  is  proportional  to  the  length  of  their 
segment,  and  is  independent  of  the  direction  of  the  line  when 
every  line  passing  between  them  is  permissible.  The  prob- 
ability that  a  lino  shall  cross  an  oval  is  one  half  the  expecta- 
tion of  a  man  who  shall  receive  one  pistole  for  each  intersection, 
tangency  counting  double,  and  this,  in  turn,  is  one  half  the 
sum  of  the  expectations  for  each  linear  element.  It  is,  there- 
fore, proportional  to  the  perimeter  of  the  oval.  If  a  line  cross 
a  certain  oval  it  may,  or  may  not,  cross  a  second  oval  within 
the  first ;  it  cannot,  however,  cross  the  latter  without  crossing 
the  former.  The  probability  of  crossing  the  inside  oval  is, 
thus,  the  probability  of  crossing  the  first,  multiplied  by  the 
probability  that,  having  crossed  the  first,  it  shall  also  cross 
the  second.  We  thus  find  the  latter  probability  by  dividing  the 
probability  of  crossing  the  inside  oval  by  that  of  crossing 
the  outside  one,  and  the  factor  of  proportionality  cancels  out, 
giving  us  : 

Theorem  l]  The  probability  thcit  a  line  which  crosses  a  given 
oval  shall  also  cross  a  second  siich  oval  inside  the  first 
is  tfte  ratio  of  the  perimeters  of  the  two. 

In  particular,  to  solve  Buffbn's  needle  problem,  we  have 
merely  to  treat  the  needle  as  an  extremely  thin  oval,  and  glue 
it  on  a  circular  disk  of  diameter  d. 

^  The  probability  that  a  line  segment  should  intersect  an 
oval  is  the  expectation  of  a  man  who  shall  receive  one  pistole 
if  the  segment  meet  the  oval  once  or  twice.  If  the  segment 
be  extremely  short,  the  probability  of  this  latter  is  negligible 
in  comparison  with  that  of  the  former.  The  probability 
that  a  short  segment  should  meet  an  oval  is  proportional  to 
the  product  of  its  length  and  the  probability  that  its  line 
should  meet  the  oval,  i.  e.  proportional  to  the  product  of  its 
length  multiplied  by  the  perimeter  of  the  oval. 


G  2 
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^  Theorem  2]  The  probability  that  two  ovals  should  intersect 

is  proportional  to  the  product  of  their  perimeters. 

Let  us  find  the  probability  that  a  line  should  intersect  two 

mutually  exterior  ovals.      Let  them  be  connected  by  direct 

and  transverse  common  tangents  as  shown  in  Fiff.   3.     The 


Fig.  3. 

probability  of  meeting  the  outside  contour  is  the  probability 
of  meeting  at  least  one  half  of  the  figure  oo.  This  is  propor- 
tional to  the  total  perimeter  of  the  oo  less  the  probability  of 
meeting  both  parts  of  the  oo,  which  is  the  probability  of 
meeting  both  ovals. 

^  Theorem  3]  The  probability  that  a  line  shall  intersect  two 
mutually  exterior  ovals  is  proportional  to  the  difference 
betiueen  the  p)erimeter  of  the  figure  oo  formed  by  the 
ovals  antl  their  transverse  comvion  tangents,  and  the 
2)crimeter  of  the  convex  figure  formed  by  the  ovals  and 
their  direct  common  tangents. 

^\  Example  5]  If  a  line  cross  a  rectangle  of  dimensions  a  and 
b,  what  is  the  probability  that  it  will  cross  two  opposite 
sides  ? 

We  consider  the  sides  as  indefinitely  thin  ovals,  and  add  the 
probabilities  for  each  pair. 

2v/(7"  +  6^  +  2a-2  (a  +  b)       2  v/(6^TF^  +2b-2(a  +  b) 
2  [a  +  b)  "*"  2  (a  +  b) 


_  2  x/g'  +  b'^ 
a  +  b 

Let  us  calculate  this  probability  again,  taking  as  independent 
variables  the  positions  of  the  points  on  the  perimeter.     The 
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probability  that  the  first  intersection  should  be  on  a  side  a, 
and  that  the  second  intersection,  which  must  not  be  on  the 
same  side,  should  be  on  the  opposite  one  is 

a  a  a^ 


a  +  b  a+2b       (a-\-b)  {a  +  2b) 

We  have  an  analogous  probability  when  the  first  inter- 
section is  on  a  side  6,  adding  the  two  together  we  get 

52     -.       a'-hab  +  b'- 


1     [-     a^  _2_l-i^ 

a  +  b[,a  +  2b       2a  +  bj~  d^ 


and  this  is  somewhat  less. 

What  will  be  the  probability  of  passing  between  two  ovals'? 
This  is  clearly  the  difference  between  the  probabilities  of 
meeting  the  outside  contour  and  that  of  meeting  at  least  one 
oval,  whence,  by  the  theorem  of  total  probability  general 
case,  we  get  : 

^  Theorem  4]  The  probability  that  a  line  ivill  pass  betiveen 
two  ^mutually  exterior  ovals  is  proportional  to  the 
difference  between  the  perimeter  of  the  00  and  the  sum 
of  their  perimeters. 

Example  6]  Tivo  secants  are  draum  across  an  oval:  what  is  the 
probability  that  they  will  intersect  within  the  curve? 
Let  p  be  the  length  of  the  normal  on  the  first  secant,  from 
a  chosen  origin  within  the  oval,  let  6  be  the  angle  which 
this  perpendicular  makes  with  a  fixed  direction,  I  the  length 
of  the  chord ;  the  probability  that  a  second  secant  shall  cross 
this  chord  is  ^q      ^^      2 1 

IT  L  8 

Here  L  is  the  distance  between  the  two  tangents  parallel 
to  the  chord,  and  8  is  the  perimeter  of  the  curve.     Now 

Idp  =  Area, 

Jo 

hence  the  probability  we  seek  is 

2  Area  [" dO^  ^ 

8        Jq   TtL 

We  must  find  this  integral  by  an  indirect  method.     We  see, 
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in  fact,  that  \/L  is  the  probability  that  a  secant  in  the  given 
direction,  which  crosses  the  given  oval,  should  also  cross 
a  circle  of  diameter  1  within  the  oval,  and  by  l] 

The  probability  sought  is,  therefore,  2  7r  Area/s^. 

In  the  case  of  a  circle  this  is  ^.  When  the  area  is  given, 
the  circumference  is  a  minimum  when  the  oval  is  a  circle. 
We  thus  reach  a  rather  curious  result : 

Theorem  4]  The  probability  that  tiuo  random  secants  of  an 
oval  should  intersect  is  equal  to  one  half  when  the  oval 
is  a  circle,  and  less  in  every  other  case. 


Problems. 

1.  Find  the  analogues  in  3  dimensions  to  Theorems  1-4,  and  Examples 
3-6. 

2.  A  die  is  thrown  on  a  board  ruled  with  parallel  lines  whose  distance 
is  greater  than  a  diagonal  of  a  face  of  the  die.  Find  the  probability  that  it 
will  cross  a  ruling. 

3.  Find  the  probability  tliat  a  line  shall  intersect  two  ovals  with  two 
common  points. 

4.  Find  the  probability  that  a  line  shall  intersect  two  ovals  with  four 
common  points. 


CHAPTER    VI 
THE  PROBABILITY   OF   CAUSES 

The  form  in  which  we  have  so  far  studied  problems  of  prob- 
ability is  not  always  that  in  which  they  present  themselves 
in  practice.  We  have  assumed  that  we  knew  just  which  were 
the  equally  likely  ways  in  which  an  event  might  happen,  or 
the  proper  independent  variables  when  the  number  was 
infinite,  and  have  calculated  the  probability  or  frequency  ratio 
from  them.  But  it  often  happens  in  practice  that  what  we 
know  is  merely  an  empirical  approximation  to  the  frequency 
ratio  from  a  limited  number  of  cases,  and  what  we  wish  to 
find  out  is  the  likelihood  that  the  actual  probability  should 
lie  within  certain  assigned  limits.  To  put  the  matter  in  con- 
crete form,  we  saw  (p.  50)  that  Buffbn  threw  a  coin  4,040 
times  and  saw  2,048  heads.  What  we  wish  to  know  is  the 
likelihood  that  that  series  arose  from  throwing  a  good  coin. 

We  have  already  learnt  one  method  of  meeting  the  problem, 
namely,  to  assume  that  the  coin  is  good,  and  calculate  the 
probability  that  the  discrepancy  will  be  as  large  as,  or  larger 
than,  that  observed.  That  does  not,  however,  cover  the  ques- 
tion entirely.  It  is  one  thing  to  say  that  if  a  coin  be  good 
the  discrepancy  will  attain  a  certain  figure  a  certain  propor- 
tion of  the  time,  it  is  quite  another  to  say  that  when  a  certain 
discrepancy  has  been  observed  a  large  number  of  times  when 
a  coin  of  unknown  constitution  was  thrown,  a  certain  propor- 
tion of  the  trials  were  in  all  probability  made  with  a  good 
coin.  It  is  the  latter  fraction,  not  the  former,  which  answers 
the  question,  '  What  is  the  probability  that  the  coin  was 
good  1 ' 

When  we  have  once  grasped  the  real  bearing  of  the  ques- 
tion of  the  likelihood  of  a  good  or  bad  coin,  we  see  immediately 
that  there  are  two  essential  elements  in  the  question  : 

A)  The  probability  that  Buffon  should  pick  up  a  good 'or 
bad  coin  to  perform  the  experiments.     Assuming  that  Buffbn's 
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good  faith  is  indubitable,  this  will  depend  upon  the  proportion 
of  coins  in  circulation  at  his  time  which  were  good,  at  least 
for  the  purposes  of  such  a  trial. 

B)  The  probability  that  if  he  threw  a  good  coin,  he  would 
obtain  as  large  a  discrepancy  as  was  observed. 

If  it  were  perfectly  certain  that  no  bad  coins  were  in  circu- 
lation at  that  time,  it  is  clear  that  the  problem  would  be 
meaningless,  but  that  is  by  no  means  sure.  In  the  same  way, 
if  it  were  absolutely  impossible  for  a  good  coin  to  produce  an 
observed  result,  the  problem  would  have  no  sense.  As  both 
possibilities  are  open,  we  are  face  to  face  with  a  real  problem. 

We  shall  mean  by  the  cause  of  an  event,  any  antecedent 
event  wJiatever.  We  mean  by  the  a  priori  probability  that  a 
certain  cause  should  be  operative  before  the  event  in  question 
has  been  observed,  the  limit  of  the  number  of  occasions  where 
the  causal  event  happened  to  the  number  of  cases  where  it 
happened  or  failed,  as  this  latter  number  is  indefinitely 
increased.  We  mean  by  the  ^probability  that  a  given  cause 
should  'produce  an  observed  result,  the  limit  of  the  ratio  of  the 
number  of  times  where  the  causal  event  was  followed  by  the 
observed  result,  to  the  total  number  of  times  when  the  causal' 
event  was  operative.  The  reader  will  not  forget  that,  accord- 
ing to  our  first  empirical  axiom,  all  trials  must  be  made  under 
the  same  essential  conditions.  Consequently,  in  determining 
the  probability  that  a  certain  cause  should  be  operative,  or 
that  it  should  produce  a  certain  result,  we  must  assume  that 
no  other  essential  features  in  the  situation  have  been  allowed 
to  vary. 

Suppose  that  there  is  a  certain  finite  class  of  causes 
CjCg-..^/,  which  might  be  followed  by  a  certain  event,  and 
that  they  are  mutually  exclusive,  yet  one  of  them  must  have 
happened.  What  is  the  probability  that  the  actual  cause 
was  Gj^  1 

Let  the  a  2:)riori  probabilities  for  the  various  causes  be 
TTiTT^.-.Tr,^,  while  the  respective  probabilities  that  they  should 
be  followed  by  the  observed  event  are  p>i2^2"'2\'  -^^^  -^  ^® 
tlie  probability  sought. 

Tlie  probability  that  cause  Cj.  should  occur,  and  should  pro- 
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duce  the  observed  event  is  wj^pj^.  But  this  probability  may 
be  reckoned  otherwise.  It  is  the  probability  that  the  event 
should  happen,  namely,  iriPi  +  '7r2P2-\-  "•iTn'Pn  multiplied  by 
the  probability  P  that  it  should  arise  from  the  cause  in 
question.     This  gives 

Bayes'  Principle  *]  If  C^G^...  Cn  he  the  total  number  of  mutu- 
ally exclusive  causes  of  a  certain  class  for  cm  ohsei^ed 
events  one  of  which  must  have  occurred^  if  iriir^^'.TTn  be 
their  respective  a  priori  probabilities,  while  ihT^-'-Vn 
are  the  various  probabilities  that  they  should  be  followed 
by  the  event,  then  the  probability  that  the  operative  cause 
was  Cj.  is 

We  shall  give  a  statement  of  this  principle  in  the  case  of 
continuous  probability,  as  this  will  be  of  use  later.  We  pass 
to  it  by  the  usual  process  of  passing  over  from  a  sum  to  a 
definite  integral. 

Bayes'  principle  for  continuous  probability]  If  all  causes  of  a 
certain  class  for  an  observed  event,  which  are  mutually 
exclusive,  yet  one  of  which  must  have  occurred,  depend 
atwdytically  upon  n  independent  variables  z^z.^...Zj^  in 
such  a  luay  that  the  a  priori  probability  thai  these 
valuables  take  values  in  the  infinitesimal  interval 
z^±^dz^,  z.^±^ dz^ ,...Zn±i dzn  differs  by  a n  infi nitesi- 
mal  of  higher  order  from  f  (z^z^... 0,1)  dz^dz^ . . . dZn , / 
being  an  analytic  function,  while  the  probability  that 
the  observed  event  shall  thenfollotu  is  (f>  (ZiZ^,..Zri),  then 
the  probability  thai  the  event  was  pi^oduced  by  a  cause 
corresponding  to  variables  in  a  certain  region  R  is 

fcpdz-j^dz^^ ...  dzji 
-^^ (2) 


f(f>dz^dz2 ...  dz^i 

T 

*  Bayes,  '  An  essay  towards  solving  a  problem  in  the  Doctrine  of  Chances ', 
Philosophical  Transactions  Eoyal  Soc,  vol.  liii,  17G3,  and  *A  Demonstration  of 
the  Second  Rule,  &c.',  ibid.,  vol.  liv,  1764.  Czuber,  Entwickelung  dtr  Wahr- 
scheinlichkeitsrechnwng,  cit.  p.  253,  gives  the  erroneous  dates  1764  and  1765. 
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where  the  integral  in  the  denominator  is  taken  over  the 
total  field  of  variation  of  the  variables  compatible  with 
the  problem. 

As  a  first  application  of  Bayes'  principle  we  take  a  well- 
known  paradox  of  Bertrand's  known  as  his  '  box  paradox  *.* 

Three  boxes  look  exactly  alike.  Each  contains  two  drawers, 
and  in  each  drawer  is  a  coin.  In  the  first  box  there  are  two 
gold  coins,  in  the  second  a  gold  and  a  silver  coin,  in  the  third 
two  silver  coins.  A  box  is  chosen  at  random  and  a  drawer 
opened :  what  is  the  probability  that  the  coin  in  the  other 
drawer  of  the  same  box  is  of  the  opposite  metal? 

Fii'st  reasoning.  This  can  only  happen  if  we  have  hit  upon 
the  second  box,  the  chance  for  that  is  §. 

Second  reasoning.  There  is  a  ^  chance  that  the  coin  first 
seen  shall  be  gold.  When  gold  has  been  seen,  we  know  that 
we  have  chosen  one  of  the  first  two  boxes,  but  we  do  not  know 
which,  they  are  equally  likely,  hence  the  chance  for  a  gold 
coin  followed  by  a  silver  is  J.  There  is  an  equal  chance  for  a 
silver  coin  followed  by  a  gold.     Hence  the  total  chance  is-|. 

It  is  evident  that  the  first  answer  is  right  and  the  second 
wrong.  The  question  is,  What  was  wrong  with  the  reasoning 
in  the  second  case  1  Here  is  the  flaw.  If  a  gold  coin  has 
been  seen,  the  a  priori  chance  for  the  first  or  the  second  box 
is  J,  but  whereas  the  first  has  a  chance  1  of  showing  a  gold 
coin  the  first  time,  the  second  has  only  a  chance  J  of  doing  so. 
The  probability  that  the  gold  coin  is  in  the  second  box  is 

11  1 

and  there  is  a  similar  probability  for  a  silver  coin.     Thi^  leads 
to  the  correct  answer  again. 

Example  1]  An  urn  contains  N  balls,  black  and  white,  in 
unknown  proportion.  A  ball  is  drawn  out  n  times 
and  replaced,  the  balls  being  mixed  after  each  drawing, 
with  the  result  that  just  r  white  balls  are  seen.  What 
is  the  probability  that  the  urn  contains  exactly  R  white 
balls? 

*  Bertrand,  loc.  cit.,  p.  2. 
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Hypothesis  1]  All  mixtures  of  white  and  black  are  equally 
likely  a  2>TiorL  Then  all  of  the  tt's  are  equal  and  will  cancel 
out,  and  we  have,  by  (1) 

r\(n-r)l\N)  \    iV    /  R'(N-Ry'-^ 


K  =  A 


r.  ^'!  in  —  r)..-         .     _ 


"^Xw)^^)       :L^'(^'-^r'- 


What  value  of  R  will  be  the  most  likely  ?  We  obtain  a  ready 
and  sufficiently  accurate  answer  by  equating  to  0  the  deriva- 
tive with  respect  to  R  of  the  logarithm.     This  gives 


r        n  —  r       n 


R      i\-R      iV 

We  ma}',  then,  say  that  the  most  likely  mixture  is  that 
where  the  actual  proportion  of  white  balls  is  the  observed 
proportion,  and  this,  indeed,  is  just  uhat  we  should  expect. 

Hypothesis  2]  The  urn  was  filled  by  drawing  white  and 
black  balls  at  random  from  an  extremel}'^  large  number  of 
balls  where  the  two  colours  were  found  in  equal  profusion.* 

Here  we  have 

N\  1      1  n\        /R\''/N-R\''-'' 

RTjIT^RYi  2«  2-^'-«  T !  (71 -r) !  \n)  \    N    ) 

RmS'-Rf 
R\{N-R)\ 


^    K\(N-K)\ 


K  IN 

The  probability  that  R  should  be  close  to  ^  +^/\/'^  was 

found,  by  the  reasoning  which  led  to  the  probability  integral, 


to  be  e   ^  Az. 


*  Bertrand,  loc.  cit.,  p.  162. 
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We  wish  to  maximize 


e 


or  else  e    "  /I  H \  /l 


\2^^         \2/ 


Equating  to  zero  the  derivative  of  the  logarithm 

r  n  —  r 

-  =  2z, 


\2   ^  \2 


—    —  2 


^ 


Since    ^/\j~  is  to  be  of  reasonable  size,  we  may  assume 

'?      .  .  .  A'' 

-j^  IS  negligible,  and  reject  z^  as  compared  with  --.    We  get 


"Wf-'^^/f +  0 


_   9  - 


2 

The  most  likely  composition  is 

i2  _  1  z     _  1  iY+2r^ 

iV"  ~  2        v^2^  ~  2  iV+u  ' 

This  varies  between  J  and  r/n  as  we  should  expect. 

This  first  example  leads  us  naturally  to  the  idea  of  establish- 
ing some  general  formula  for  the  probability  of  causes  analo- 
gous to  the  Bernoulli  formula.  Suppose  that  an  event  has 
succeeded  np  times  and  failed  nq  times  in  n  trials.  If  all 
probabilities  for  success  be  a  priori  equally  likely,  the  most 
likely  probability  for  success  is  p.  What  is  the  chance  that 
the  observed  series  resulted  from  the  operations  of  a  cause 
which  gave  a  probability  of  success  lying  between 


,  +  ^J?f  and,.-H./J?|?. 
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The  probability  is 

ii         , 


-^     7bp  !  liq  ! 

I'^P+ti^/'iai 


n ! 


2     ,    .i^i^(i-p)^g 

^^nplnql 

With  regard  to  the  summations,  which  are  rather  meaning- 
less as  they  stand,  we  assume  that  nP  is  an  integer,  so  that 
P  increases  by  l/ii  each  time. 


Let  US'  write  P  =  /)  +  c a  /  — — 


A.=        ' 


V2  wpq 


We  multiply  every  term  above  and    below   by  this   and 
cancel  the  factorials,  gettinof 

Divide  numerator  and  denominator  by  p'^PqM 


-^  L^  >    n         nq       ^^        'J 
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Our  fraction  above  will  approach  asymptotically  to 


:  =  ^ 


2      .-^^ 


A;^ 


2=-    a/'«/' 


The  limit  of  this  as  n  increases  indefinitely  is 


In 


,~S' 


dz 


r 


r/. 


2La/ 


TT 


.3''    7 


V 


rf 

e   -  a: 


IT 


1 

We  thus  get 

Theorem  1]  If  in  a  large  number  n  of  trials  where  all  prob- 
abilities for  an  individual  success  are  a  priori  equally 
likely,  there  be  np  successes  and  nq  failures,  then  the 
probability  that  the  cause  is  such  as  to  give  a  prob- 
ability of  success  lying  betiueea  the  limits 


w 


i[0(g-e(y] 


(3) 


Converse  of  Bernoulli's  theorem]     Under   the  conditions  of 
Theorem  i,  the  probability  that  the  cause  is  such  as  to 

give  a  probability  of  success  in  the  lim^its  p±t  /J—^  is 

e  {t).  (4) 

^  We  must  now  face  the  possibility  that  all  causes  are  not 
equally  likely  a  priori.  We  are  thrown  back  upon  our 
formula  (2),  which  we  simplify  by  the  law  of  the  mean 

/  (Z)  <i>  (Z)  dz  =/[<!  +  h  {t,  -  f ,)]       >  (S)  dz. 

The  most  interesting  case  is  to  compare  the  probabilities 
under  the  conditions  of  Theorem  l],  except  the  a  priori  con- 
dition, that  t  should  lie  in  the  region  between  t^  and  t^,  or  else 
between  t^'  and  t^.     The  ratio  will  be 

f[t,+h(t^-t,)]     e(t,)-e(t,) 


/LV+^^'(V-V)]  e(^/)-0(V) 


(5) 
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When  the  regions  are  both  very  small,  we  may  put 

^1  +  ^2  =  2^;  ^2-^1  =  ^^;  V  +  V=  2^';  V-^i'=  2Ai'; 
the  ratio  then  takes  the  simpler  form : 

f(t)      e-^'At 


(6) 


fit')  '  e-''Af 

We  shall  apply  this  to  an  amusing  problem  proposed  by 
Bertrand.* 

'  The  owner  of  a  gaming  establishment  has  installed  a  rou- 
lette wheel.  In  10,000  turns  this  has  shown  red  6,300  times, 
black  4,700  times.  The  owner  refuses  to  pay  for  the  wheel 
and  claims  damages  ;  his  clients  have  noticed  that  the  wheel 
seems  to  favour  red.  They  go  to  law  about  it.  The  owner 
claims  that  a  good  wheel  was  never  known  to  show  such  a 
discrepancy.  300  turns  in  10,000  cannot  be  the  result  of 
chance.  The  chance  for  red  is  not  ^,  as  it  ought  to  be.  "  Never 
mind  the  record  of  the  turns  so  far,"  says  the  maker,  "  you 
cannot  insure  against  the  caprices  of  fortune.  The  machine 
was  made  by  excellent  workmen,  and  was  carefull}^  inspected. 
No  part  of  it  is  imperfect.  There  is  no  bad  centring  of  any 
wheel,  no  inequality  in  the  size  of  the  divisions,  no  error  in 
levelling."  The  Court  calls  in  an  expert ;  what  should  he 
say?  ' 

According  to  the  maker,  the  probability  for  red  lies  between 
0-499  and  0'501  ;  according  to  the  owner  it  lies  between  0-529 
and  0-531.     We  have 


^/ 


-^=  -—  y0-53x0-94, 
n  100 


=  0-0070583, 
0-53  +  0-0070583^1  =  0-501, 

t^  =  -4-10862, 
t^  =  -4-39197, 
t  =  -4-25030, 
At  =  0-1416, 

*  Bertrand,  loc.  cit.,  p.  166. 


Problem. 

Discuss  Buffon's  coin  and  Lazzerini's  needle  by  the  methods  of  the 
present  chapter. 
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0-53  +  0-0070583^/  =  0-531, 

^/  =  0-1416, 

^/=  -0-1416, 

^'=0, 

At'  =  0-1416. 

We  thus  need  to  find 

fit)     -(4.2503)'^ 

and  we  find  from  the  table  on  p.  208  that  this  is 

fit) 

•^ _    '  y  o-oooonooi/i 

/(O) 

Bertrand's  solution  is  simpler.  He  compares  the  probabili- 
ties under  the  two  hypotheses  of  getting  just  this  result, 
namely, 

fit)         (O-Spo^  (0-5)4700  f{t) 

It  will  readily  be  granted  that  if  the  maker  of  the  wheel  be 
known  to  be  careful  and  conscientious,  /  (t)  will  be  many 
times  larger  than/(0).  It  is  hard  to  believe,  however,  that 
tlie  ratio  of  the  two  would  be  large  enough  to  bring  the  pro- 
duct up  to  respectable  size.  The  expert  would  doubtless 
decide  aoainst  the  maker. 

There  is  another  point  that  should  be  noted  in  this  con- 
nexion, which  is  rather  subtle  and  easily  overlooked.  We 
have  no  right  to  settle  after  the  event  what  constitutes  really 
a  remarkable  run.  Let  us  return  to  Button  and  his  coin.  It 
will  bo  noted  that  the  discrepancy  Avas  28,  and  this  is  exactly 
the  year  of  the  Christian  era  when  John  the  Baptist  was  cast 
into  prison.  Let  us  examine  the  probability  that  the  coin  was 
so  constructed  as  to  show  this  date  when  thrown  that  number 
of  times.  It  is  easy  to  calculate  the  probability  that  a  coin 
giving  to  heads  the  probability  507/1010  should  show  no  dis- 
crepancy in  4,040  throws,  and  this  is  considerably  greater  than 
the  probability  that  a  good  coin  should  show  exactly  the  dis- 
crepancy 28.  But  the  a  'priori  probability  that  a  coin  should 
be  so  constructed  as  to  predict  the  date  of  John  the  Baptist's 
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imprisonment  is  so  microscopic  compared  with  the  probability 
that  a  coin  should  be  good,  that  we  reject  the  former  hypo- 
thesis without  more  discussion.  The  reader  will  find  it 
amusing  to  apply  this  type  of  reasoning  to  such  problems  as 
the  probability  that  the  great  Pyramid  was  specially  placed 
by  Divine  Providence  to  reveal  the  value  of  n,  the  length  of 
the  British  inch,  and  other  interesting  facts  which  Piazzi 
Smyth  and  others  have  deduced  from  its  measurements. 

Bayes'  principle  has  sometimes  been  used  to  deduce  the 
probability  for  future  events.  The  reader  will  have  no 
difficulty  in  proving : 

Bayes'  principle  applied  to  future  events]  //CjC^...(7^  be  the 
total  number  of  TYiutually  exclusive  causes  for  an 
observed  event,  one  of  which  must  have  occurred^  if 
ir^TT^'^'TTn  be  their  respective  a  priori  probabilitieSy 
p^P2...pn  the  various  probabilities  that  they  shoidd  be 
followed  by  the  observed  event,  while  P^Pc^...P,i  are  the 
respective  probabilities  that  they  shall  be  followed  by  an 
expected  events  then  the  probability  thai  the  expected 
event  shall  take  place  is 

K  =  n 

^5^ (7) 

A'  =  1 

In  the  same  way  we  may  prove  : 
Bayes'  principle  for  continuous  probability  applied  to  future 
events]  If  all  causes  of  a  certain  class  for  an  observed 
event,  which  are  mutually  exclusive  yet  one  of  which 
must  have  happened,  depend  analytically  uj)on  n 
independent  variables  2iZ2...Zj^  in  such  a  luay  that 
the  a  priori  probability  that  these  variables  talce  values 
in  the  infinitesimal  inteiuah 

z^±^dz^,  z^  +  ^dz^, ...  Zn±^dz^ 
differs  by  an  infinitesimal  of  higher  order  from 

f{z^z^.:.Zn)dz^dz^.,.dz^, 
while  the  probability  that  the  observed  event  shall  then 

2686  H 
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folloiv  is  (f)  (z^z^  '"^n)  ^"^-^  ^^^  probability  for  a  future 
event  is  yir(z^z^...z^,  then  the  total  probability  for  the 
occurrence  of  the  future  event  is 


f(j)'\\rdz-^dz^  ,..dZj 


f(t)dz^dz^.,,dz,^ 


(8) 


each  integral  being  taken  over  the  whole  field  of  2^ossible 
values. 

Example  2]  If  in  n  trials  of  an  event  for  which  all  probabili- 
ties are  equally  likely  a  priori,  there  have  been  just  r 
successes,  what  is  the  probability  that  there  will '  be 
just  R  in  a  further  series  of  iV  trials  ? 

If  X  be  the  probability  for  success,  the  probability  will  be, 
by  (8) 


RiiJU-B) 


X 


n 


x''{l-x)'^-''dx 


Now 


x^  (1  —xY'dx  —  r, — — ' — 
0      ^  (Z  +  m+1)! 


Hence  our  desired  probability  is 

N\  (R-^r)U]^+n-(R  +  r))l    (n+l)l 


R\(F-R)l 


(iV  +  7t+l)! 


r !  {n  —  r) ! 


(9) 


When  all  the  numbers  are  large  we  may  apply  Stirling's 
formula,  getting 


■TTZ  T^K  +  k 


yN-R  +  k 


i(i\^+,,+  l)^>^  +  i/+i(,,_^^n-r  +  | 


V2tt       R'''^^(N~R)  ^yiy  +  n+i)     ■       -r 

When  we  are  interested  in  only  one  further  trial, 


(10) 


and  (9)  becomes 


y  =  R=  1 


(r-\-l)l(n^r)l(n+l)l  _  r+l 
{'n-\-2)\rl  {n~r)l       "  n  +  2 


(11) 
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When  the  event  has  never  failed  so  far,  r  =  n  and  we  have 

{n-hl)/{n  +  2).  (12) 

The  most  absurd  consequences  have,  in  the  past,  been 
deduced  from  this  formula.  Putting  n  equal  to  the  number 
of  times  the  sun  has  risen,  it  has  been  used  to  estimate  the 
probability  that  it  Avill  rise  the  next  day.  Nothing  could  be 
more  grotesque.  The  rising  of  the  sun  is  not  a  statistical 
event  whose  cause  is  obscure,  but  a  mechanical  necessity 
which  will  continue  as  long  as  present  astronomical  conditions 
do,  and  will  then  cease.  To  use  formula  (12)  we  should  have 
to  assume  that  all  possible  cosmogonies  were  equally  likely. 
What  such  a  phrase  may  mean  is  utterly  beyond  our  com- 
prehension :  it  undoubtedly  means  nothing  whatsoever. 

The  probability  that  exactly  the  same  proportion  of  success 
will  appear  in  a  second  series  of  n  trials  as  appeared  in  a  first 
series  will  be  found  from  (9)  by  putting  N  =  n,  R  =  r. 

71+1   r        n\        "[-r(2r)!  (27i-2r)!-| 
2n^\i,r\  {ii-r)\\   L  ^n'\  J* 

Replacing  the  first  factor  by  ^,  approximating  to  the  rest 
by  Stirling's  formula,  we  have 


( '}—  V 

\47rr  (ri  — r)/ 


r  (ri  — r), 

On  the  other  hand,  if  we  surely  knew  that  the  probability 
for  success  was  r/n,  the  probability  for  exactly  r  success  is 
given  by  the  last  formula  of  Ch.  Ill,  §  2,  namely 


( Vl Y 

\27rr(/t-r)/ 


(u-r).^ 

The  difterencc  between  the  two  arises  from  the  fact  that  in 
one  case  we  are  sure  of  the  probability  of  success ;  in  the 
other,  we  only  surmise  it.* 

It  is  perfectly  evident  that  Bayes'  principle  is  open  to  very 
grave  question,  and  should  only  be  used  with  the  greatest 
caution.  The  difficulty  lies  with  the  a  priori  probabilities. 
We  generally  have  no  real  line  on  them,  so  take  them  all 
equal.     Suppose  that  n  balls   have  been  drawn  at  random 

*  This   interesting  comparison  is  taken  from  Czuber,   Wahrscheirdichkeits- 
rechnung,  cit.  p.  200. 

H   2 
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from  an  urn  having  white  and  black  balls  in  unknown 
mixture,  and  that  a  white  ball  has  been  drawn  just  r  times. 
AVhat  is  the  probability  of  drawing  a  white  ball  the  next 
time?  We  should  like  to  use  formula  (11).  When  is  it 
Bafe  to  do  so  ? 

That  formula  was  derived  on  the  hypothesis  that  all  mixtures 
were,  a  ^)?'ior/,  equally  likely.  That  does  not  mean  that 
when  we  know  nothing  at  all  about  an  urn  all  mixtures  are 
equally  likely.  We  have  already  discussed  that  meaning  of 
equally  likely  in  Ch.  I.  What  it  does  mean  is  this  :  *  Imagine 
an  immense  number  of  urns  containinor  black  and  white  balls 
in  varying  proportions,  but  with  a  fixed  number  of  urns  with 
each  mixture.  Then  if  an  urn  be  drawn  at  random  and 
n  drawings,  with  replacement,  be  made  therefrom,  showing 
just  r  white  balls,  the  probability  that  the  next  ball  will  be 
white  is  accurately  given  by  (11).  It  is  only  when  we  can 
give  a  really  precise  statement  of  this  sort  that  Bayes' 
principle  can  be  used  with  perfect  confidence,  and  the  cases 
are  rare. 

Why  not,  then^  reject  the  formula  outright?  Because, 
defective  as  it  is,  Bayes'  formula  is  the  only  thing  we  have 
to  answer  certain  important  questions  which  do  arise  in  the 
calculus  of  probability.  The  question  as  to  the  likelihood 
that  a  coin  which  showed  a  given  succession  of  heads  and 
tails  should  be  bad  is  real  and  insistent.  To  say  what  might 
reasonably  have  been  expected  from  a  good  coin  under  the 
circumstances  does  not,  by  any  means,  cover  the  case.  There- 
fore we  use  Bayes'  formula  with  a  sigh,  ab  the  only  thing 
available  under  the  circumstances  : 

*Steyning  tuk  him  for  the  reason  the  thief  tuk  the  hot 
stove — bekaze  there  was  nothing  else  that  season.'  f 

♦  Cf.  Castelnuovo,  loc.  cit.,  p.  170. 
+  Kipling,  Captains  Courageous,  ch.  vi. 


CHAPTER  VII 

ERRORS  OF  OBSERVATION 

§  1.    Determination  of  the  *Best  Value' 

There  is  no  such  thing  as  a  perfect  physical  nieasuremcnt. 
Absolute  accuracy  is  a  fiction,  and  is  never  attained  in 
practice.  What  is  meant  by  an  *  exact  value '  is  a  value 
which  is  sufficiently  exact  for  purposes  of  a  certain  class.  In 
fact  it  is  not  always  possible  to  say  what  is  meant  by  the 
'  true  value '  of  any  quantity.  What  is  the  true  length  of 
a  bar  of  iron  ?  That  will  depend  on  the  temperature  of  the 
iron ;  perhaps  on  the  direction  and  velocity  of  its  motion 
through  space,  if  the  recent  theories  of  relativity  be  correct. 
But  if  there  be  room  for  doubt  as  to  what  the  true  value 
really  is,  there  will  be  infinitely  more  about  any  attempts  to 
measure  it.  Suppose  that  we  say  that  two  towns  are  exactly 
three  and  one-half  miles  apart,  what  do  we  really  mean? 
Different  persons  will  mean  different  things  by  these  same 
words.  A  careless  person  might  mean  that  some  point  within 
a  few  rods  of  the  post  office  in  one  is  exactly  three  and  one- 
half  miles  from  some  point  within  a  few  rods  of  the  jail  in 
the  other,  but  such  a  statement  would  never  do  for  a  surveyor. 
If  he  said  that  the  towns  were  exactly  three  and  one-half 
miles  apart  he  would  mean  that  some  landmark,  as  a  mile- 
stone, in  one  was  separated  from  a  similar  landmark  in  the 
other  by  a  distance  within  a  few  inches  of  three  and  one-half 
miles.  The  geographical  meridian  of  Paris  runs  from  a  mark 
in  the  middle  of  a  doorway  on  the  south  side  of  the  Observatory 
to  a  short  vertical  iron  rod  in  the  middle  of  a  hole  in  a  stone 
column  erected  in  the  park  of  Mont  Souris.  This  extreme 
topological  accuracy  would  be  counted  the  height  of  care- 
lessness in  a  machine  shop  where  lengths  were  measured  to 
the  nearest  thousandth  of  an  inch,  and  machine-shop  accuracy 
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is  nowhere  near  sufficient  lor  work  in  optics,  where  we  think 
in  terms  of  wave  lengths  of  light. 

A  true  theory  of  physical  measurements  must  therefore 
start  from  the  assumption  that  they  always  contain  errors. 
To  what  are  these  errors  due  ?  A  few  moments'  reflection 
shows  that  they  fall  into  two  general  classes  : 

A)  Constant  errors.  These  are  due  to  inherent  imperfections 
in  the  instruments  of  observation,  and  in  the  observer,  but 
do  not  vary  from  one  observation  to  another  one  bearing  on 
the  same  object.  We  measure  distances  with  a  scale  whose 
indicated  lengths  are  too  short.  We  observe  the  altitude  of 
the  sun  with  a  sextant  whose  0  is  wrongly  placed.  We 
measure  a  time  interval  with  a  chronograph  which  gains  at 
a  constant  rate.  We  note  the  transit  of  a  star  across  the 
hair-line  when  our  personal  equation  causes  us  to  record 
the  phenomenon  too  soon.  Errors  of  this  general  sort  arc 
inseparable  from  any  sort  of  physical  observation.  Neither 
the  instrument  nor  the  observer  can  be  perfected  to  such  an 
extent  as  to  eliminate  them  completely.  All  that  we  can  do 
is  to  estimate  them  as  accurately  as  possible  by  measuring 
(juantities  of  known  value,  or  by  other  means. 

B)  Accidental  eriors.  These  are  supposed  to  arise  from 
minute  causes  which  vary  from  one  observation  to  another; 
they  are  fluctuating  variations  in  the  observer,  the  instru- 
ments, and  the  quantity  observed.  To  run  through  the  same 
list  as  before,  the  coefticient  of  expansion  of  the  scale  may  be 
different  from  that  of  the  quantity  measured,  and  the  tem- 
perature may  be  somewhat  above  or  below  the  mean.  In 
reading  the  vernier  of  a  sextant,  the  lines  nearest  coincidence 
will  differ  by  a  fraction  of  a  hair's  breadth,  one  way  or  the 
other.  The  chronograph  is  not  perfectly  sealed  from  the  outer 
air,  and  is  influenced  by  variations  of  temperature  and 
atmospheric  pressure.  The  observer's  nervous  reaction  is 
slightly  faster  or  slower  than  usual,  causing  a  variation  in 
the  rapidity  of  perceiving  the  passage  of  a  star  across  the 
spider  line. 

The  fundamental  problem  with  which  we  shalhbe  occupied 
in  the  present  chapter  is  to  formulate  a  general  mathematical 
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theory  of  these  accidental  errors.  At  the  outset  it  must  be 
understood,  beyond  all  possibility  of  misconception,  that  any 
such  law  will  represent  merely  an  approximation  to  the 
truth.  There  is  no  answer  to  the  question,  ^  Why  should 
accidental  en'ors  in  different  sorts  of  observations  obey  the 
same  law  ? '  There  is  no  reason  why  they  should,  and 
undoubtedly  they  do  not.  The  real  question  is :  Can  an 
approximate  law  be  found  which  is  sufficiently  accurate  for 
the  purposes  for  which  it  is  needed?  The  ultimate  test 
for  such  a  law  will  be  '  how  well  does  it  work  out  in  practice  ?  ' 
If  it  work  well,  it  is  a  good  law,  even  if  founded  on  assump- 
tions of  doubtful  validity.  If  it  work  badly,  then  it  is  of 
little  importance,  even  though  the  mathematical  deduction  be 
highly  instructive.  The  problem  is  to  make  the  broadest 
and  most  plausible  assumptions  which  will  lead  to  a  definite 
formula,  and  then  to  test  that  formula  in  practice. 

Assumption  l]  Tlie  mean  value  of  an  accidental  error  is  zero. 

It  must  be  understood  that  this  is  just  an  arbitrary  assump- 
tion, like  those  on  which  elementary  geometry  is  based. 
That  it  is  a  plausible  one  is  seen  from  considering  the  opposite 
case.  For  if  this  mean  value  were  positive  or  negative,  there 
would  be  a  constant  tendency  towards  errors  of  the  one  sort 
or  the  other,  and  this  would  count  in  with  the  constant 
errors. 

Assumption  2]  The  probability  of  an  accidental  error  decreases 
as  the  numerical  magnitude  of  that  error  increases. 
There  is  a  so-called  proof  of  this  principle  which,  in  reality, 
is  based  upon  an  assumption  far  less  obvious  than  the  assump- 
tion in  question.  The  idea  is  that  each  accidental  error  is 
the  result  of  an  accumulation  of  atomic  errors  called  '  funda- 
mental errors ',  arising  from  small  independent  causes.  These 
fundamental  errors  are  supposed  to  be  of  the  same  size,  and 
each  has  an  equal  chance  of  being  positive  or  negative.  The 
error  actually  committed  represents  the  excess  of  positive 
over  negative  fundamental  errors,  or,  vice  versa,  it  is  pro- 
portional to  the  discrepancy  in  a  series  of  trials  where  there 
is  a  half  chance  of  heads  or  tails,  and  we  know  already  that 
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the  chance  for  a  discrepancy  is  less  and  less  as  the  latter 
increases  numerically.     The  reason  why  we  do  not  favour 
this  method  of  treating  the  subject  is  that,  in  reality,  it  seems 
likely  that  these  fundamental  errors  are  a  pure  fiction,  and 
that  the  actual  errors  committed  do  not  arise  in  any  such  way. 
We  now  suppose  that  we  have  a  set  of  discordant  observa- 
tions of  the  same  quantity,  after  all  constant  errors  have  been 
eliminated    or    accounted    for.      The    obvious    fundamental 
question  is  this  :   What  value  shall  we  take  as  our  best  estimate 
of  the  quantity  ?     We  shall  answer  this  question  by  making 
certain  plausible  mathematical  assumptions  about  a  quantity 
which  we  shall  call  the  best  value,  and  show  how  this  latter 
can  then  be  found.     It  must  be  understood  that  these  assump- 
tions are  nothing  but  definitions  of  what  the  words  '  best  value ' 
mean.      This  method  of  procedure  seems  to  ht*jve  been  first 
developed  by  the  Italian  astronomer  Schiaparelli.^     We  shall 
do  our  best  to  motivate  our  assumptions  as  we  go  along. 
Postulate   l]    When  a  niiinher  of  discordcint  measures  have 
been  made  on,  the  same  m^agnitude^  and  constant  errors 
have  been  eliminated,  the  best  value  is  a  continuous 
function  of  the  m^easures,  which  2^ossesses  first  'partial 
derivatives  icitli  respect  to  all  the  arguments. 
The   obvious   objection  has  been   made  to   this  postulate 
that  it  was  not  at  all  evident  why  this  function  should  be 
differentiable.      Schimmack  reached  the   same    function   by 
somewhat  different  postulates  which  did  not  include  differentia- 
bility,t   and  his  postulates  have  been  shown   by  Beetle  to 
be    completely   independont.J      But   Schimmack   makes   the 
assumption  that  the  best  value  for  n  +  l  observations  is  what 

*  Scliiaparelli,  '  Sul  piincipio  della  media  aritmetica*,  Rendiconti  del 
R.  Istituto  Lomhardo,  Series  2,  vol.  ii,  1868,  and  *  Sur  le  principe  de  la  moyenne 
arithmetique',  Afitronomische  Nachrichten,  vol.  Ixxxvii,  187G  (Czuber,  Entivicke- 
lung,  cit.,  gives  for  this  tlie  erroneous  date  of  1895).  This  last  is  a  refutation 
of  a  priority  claim  put  forth  in  the  same  number  by  Stone,  and  based  upon 
liis  paper,  *  On  the  most  probable  result  whicli  can  be  deduced  from  a  number 
of  direct  determinations  of  assumed  equal  values ',  Monthly  Notices  Koyal  Astro- 
nomical Soc,  vol.  xxxiii,  1873. 

t  Schimmack,  *  Der  Satz  vom  arithmetischen  Mittel ',  Math.  Annalen,  vol. 
Ixviii,  1909. 

t  Beetle,  'On  the  Complete  Independence  of  Schimmack's  Postulates', 
ibid.,  vol.  Ixxvi,  1915. 
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it  would  be  if  each  of  the  first  n  were  replaced  by  the  best 
value  for  those  n,  and  this  does  not  seem  at  all  self-evident 
either.  It  is  certainly  hard  to  believe  that  the  best  value  is 
not  a  continuous  function,  and,  if  continuous,  we  can  approach 
to  it  with  any  degree  of  accuracy  we  desire  by  means  of 
difFerentiable  functions,  so  that  the  inclusion  of  differentiability 
does  not  add  much  '  to  the  load '. 

Suppose  that  for  one  reason  or  another  we  decide,  in  the 
course  of  our  observations,  to  change  the  scale  or  unit.     We 
should  naturally  expect  to  produce  thereby  a  corresponding 
change  in  our  best  value.     This  leads  to 
Postulate  2]   If  all  the  observed  values  be  multiplied  by  the 

same  constant  factor,  the  best  value  will  be  miUtiplled 

thereby. 

We  naturally  look  upon  the  best  value  as  intrinsic  in  the 
observations,  and  independent  of  the  origin  whence  measure- 
ments are  made.     This  leads  to 

Postulate   3]    If  the  same  constant  he  added  to  each  of  the 
observed  values,  that  constant  ivill  he  added  to  the  best 
value. 
If  the  best  value  be  a  function  only  of  the  observations, 

the  order  in  which  they  are  taken  must  not  affect  the  latter. 

This  gives 

Postulate  4]    When  all  the  measures  are  equally  trustworthy, 
the  best  value  is  a  symmetric  function  of  them. 

With  these  postulates,  it  is  easy  to  determine  what  sort  of 
a  function  the  best  value  is.  Let  the  observed  values,  after 
constant  errors  have  been  eliminated,  be  x^,  x^,  ...^n*  ^'^® 
best  value  shall  be/(a;i,  x^,  ...  ajj-  Since  it  is  cei-tainly 
possible  that  all  the  observed  values  should  be  equal,  the 
function  cannot  become  singular  for  every  set  of  values 
rCj,  x,^,  ...  x^.  Hence,  by  change  of  origin  we  may  assume 
that  the  function  and  its  derivatives  all  exist  for  the  set  of 
values  0,  0,  ...0.     Bv  the  law  of  the  mean  we  have 

7  7  ^ 

=/(0,0,  ...  0)+  2^^-^^^/ 
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Where  0  <  T/i  <  ^"^i  or  0  ^  t/^-  ^  hx^. 
Putting  k  =  0  /(O,  0,  ...0)  =  0. 

Dividing  by  A; 

J  \X^y  ^2 '  •  •  •  ^n)  —  ^^i  Y^  * 

Since  the  left  side  is  independent  of  k^  on  the  right  we  may 
put  A;  =  0,  and  y^  =  0. 

it   r —  =  (fi  when  Xi  =  0 

i  =  n 

i  =  1 
Changing  x^  to  x^  +  dj  and  applying  postulate  3] 

It  is  better  to  replace  the  coefficients  rt^.  by  numbers  pro- 
portional to  them  and  write  the  best  value 

fix,  ,x,,...x„)=t^ ''' "^r^r^^  •  •  "J"/ ""'" = ^-    0) 

Theorem  l]  If  a  set  of  discordant  Tneasiior.meiits  he  taken  of 
the  same  object,  the  hest  value  to  take,  after  constant 
errors  have  been  eliminated,  is  a  Itomoge aeons  linear 
function  of  the  measures ^  ivhere  the  sum  of  the  coejficients 
is  equal  to  unity. 

We  find  immediately  from  postulate  4] 

Theorem  2]  When  all  of  the  measurements  are  equally  trust- 
worthy, the  best  value  is  their  average. 

The  coefficients  are  called  vjcights,  and  it  is  evident  in 
formula  (1)  that  w^e  are  not  primarily  concerned  with  their 
actual  values,  but  with  their  ratios.  We  shall  also,  hereafter, 
refer  to  the  '  best  value '  as  the  weighted  mean.  Suppose, 
further,  that  .t^  was  found  as  the  average  of  n^  standard 
observations,  x^  as  the  average  of  n<^  of  them,  and  ic„  as  the 
average  of  n^.  The  weighted  mean  of  all  of  the  standard 
observations  would  be  the  expression  (1)  where  the  letter  p^ 
was  replaced  by  the  corresponding  letter  ^^. 
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Theorem  3]  If  it  he  2^ossible  to  express  each  measurement  as 
the  average  of  a  certain  number  of  standard  observa- 
tions, then  the  weights  in  the  weighted  mean  are 
'Proportioned  to  the  number  of  standard  observations 
in  each  case. 

We  must  now  give  a  number  of  definitions  which  will  be 
of  frequent  use  in  what  follows. 

Definition]  The  positive  square  root  of  the  Tnean  value  of  the 
square  of  an  error  which  may  occur  in  a  series  of  like 
observations  is  called  the  mean  error. 

Definition  I  The  mean  value  of  the  numerical  measure  of  the 
error  is  called  the  averaore  error,* 

Definition]  The  positive  number  which  there  is  a  half  chance 
that  the  numerical  value  of  the  error  will  not  exceed  is 
called  the  probable  error. 

The  reader  should  compare  these  definitions  with  those  of 
mean,  average,  and  probable  discrepancies  on  pp.  49,  50.  We 
shall  see  later  tiiat  when  the  errors  are  distributed  according 
to  the  exponential  law  of  Gauss,  these  three  are  constant 
multiples  one  of  another. 

Let  the  real  unknown  value  of  the  quantity  we  are  measur- 
inij  be  a\     The  error  of  the  weiorhted  mean  is 

i  —  >l 

^Pi(Xi-x) 


i-  1 

It  is  a  little  more  convenient  to  write  this  in  the  form 

i  =  n  7  =  n 

•/  =  1  V  =  1 

The  mean  value  of  each  of  these  terms  is,  by  Assumption  1], 
equal  to  0. 

*  There  is  no  complete  agreement  as  to  these  definitions.  Some  books  use 
tlie  term  *  mean  error '  for  that  which  we  have  called  '  average  error '.  Others 
call  our  mean  error,  which  is  really  rather  ill-named,  •  root  mean  square ',  a 
ponderous  title. 
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Let  US  assume  that  the  unknown  mean  error  of  the  measure- 
ment x^  is  J 

The  reasons  for  writing  this  clumsy  expression  will  appear  in 
the  sequel.  We  wish  to  find  the  mean  error  of  the  weighted 
mean.     We  may  apply  Ch.  IV,  5]  and  write  for  this  the  value 

KT^^^J^jb''  2«.  =  i-  (2) 

>    t  =  1  *  i  =  1 

When  all  of  the  measures  are  equally  trustworthy,  each  a^ 
is  \/n,  so  that  the  mean  en-or  of  the  weighted  mean  is 

-i_  -      1 

AV2  "~  ki^^2n' 

Theorem  4]   The  mean  error  of  the  average  of  a  number  of 
equally  trustivorthy  measurements  is  the  mean  error 
of  a  single  measurement,  divided  by  the  square  root  of 
the  number  of  measurements. 
Let  us  see  what  values  of  the  coefficients  a^  will  minimize 

the  mean  error  of  the  weighted  mean.     We  must  minimize 

i  ~  n         „  i  r-.  n 

2  =  1  *  i  =  1 

This  amounts  to  minimizing 

1  =  1         * 
equating  to  0  the  partial  derivative  to  a  we  have 

When  we  are  in  the  case  where  x^  is  the  average  of  '^i^- 
standard  measurements  this  amounts  to  putting 

and   gives   the    system    of   weighting  already  found.      It  is 
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natural,  then,  to  make  the  minimizing  of  this  mean  value 
a  general  principle,  and  state  : 

Postulate  5]  The  weights  in  the  weighted  mean  are  those 
coefficients  which  tvill  maJce  the  mean  error  of  this 
expression  a  minimumi. 

Theorem  5]  The  lueights  in  the  weighted  mean  are  inversely 

'proportional  to  the  squares  of  the  mean  errors  of  the 

individual  measurements. 

The  trouble  with  all  of  this  work  with  the  mean  errors 

is  that  we   do  not  really  know  anything   at  all  about  the 

errors   actually    committed.      If  we   did,    we   should   know 

the  true  value  sought.     The  best  we  can  do  is  to  manipulate 

certain  observed  quantities  nearly  equal  to  the  errors. 

Definition]  The  difference  betiueen  a  measurement  and  the 
weighted  mean  is  called  a  residual  error,  or,  more 
briefly,  a  residual. 

The  residual  corresponding  to  the  measurement  x^  is 

/  =  n 

=  i^i-^)-  2  «i(^i-a;). 

i  =  l 

Theorem  6]  The  mean  value  of  a  residual  is  0. 

The  quantity  in  which  we  are  particularly  interested  is 

i  —  n  .  i  =  n 

^PiUn^Vi'  (3) 

i  =  \  i  =  \ 

When  we  are  under  the  hypothesis  of  3]  this  is  the  average 
of  certain  observed  quantities,  and  by  Tchebycheff  s  inequality, 
may  then  safely  be  replaced  by  its  mean  value.  This  leads 
naturally  to : 

Assumption  3]    The  mean  value  of  expression  (3)   may  he 
replaced  by  its  observed  value. 
Let  us  calculate  this  mean  value. 
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€i  =  Xi'-X 


TpjXi-rpjXj 

2 J 


j  J 


■S.P 


J 


J 


The  notation  5"  means  that  the  term  with  subscript  i  is 
lacking.  The  mean  value  of  each  individual  term  is  0  ;  hence 
the  mean  value  of  e,*^  ia 


(j'Pi? 

PC 

(^Pjf 

J 

Now, 

by 

theorem 

5] 

1 

1       1 

2V      Pi   2A;^ 
Hence  the  mean  value  of  e^^  is 


(Spjf 

k'f'^^ 

1 
'2P 

rpj 

) 

2ki'Spj 

an  value  of  (3) 

to  its  obi 

1  =  ft 

1 

2B    ~ 

2  Vi^i 

i  =  1 

i  =  n 

I  =  it 

^^1 

^Pi 

i  =  l 

i«l 
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/■  i=z  n 


^V2 


/.•iv/2       ypi{n-\). 
We  thus  get  from  (2) 

Theorem  7]  The  mean  error  of  the  weighted  raean  of  a  set  of 
Tneasurevients  ivhose  weights  are  l^i^p^,  ...  7?n'  '^'^^^  ^^^^ 
corresponding  residuals  are  e^,  e^,  ...  6„,  ^s 


/     t  =  n 


PiU 


i  =  1 


K^2     y{il-\)^p^)  '  (^) 

Theorem  8]  The  mean  error  of  a  measure  of  weight  pi  under 
the  same  circumstances  is 


kiV2 


^PiU' 


^ 


i  =  l 


(5) 


\^(n-l)p.) 

Theorem  9]  When  each  weighted  observation  is  the  average  of 
a  number  of  standard  measures^  the  mean  error 
of  a  standard  measurement  is 


( 


1 


2 

=  1 


Pi^i 


kV2       y  (/i-l)   }  '  ^^) 

^[  We  saw  in  the  work  which  led  up  to  postulate  4]  that 
when  the  given  observations  are  equally  trustworthy,  the 
average  is  that  weighted  mean  which  will  have  the  least 
mean  error.  Moreover,  the  sum  of  the  squares  of  the  actual 
errors  t  =  i 

i  =  1 

will  be  a  minimum  if  i  =  n 

i  =  1 


X  = 


n 


112  ERRORS    OF    OBSERVATION 

and  this  gives  additional  reason  to  choose  the  average  as 
the  best  value.  At  the  same  time,  there  arise  cases  where  the 
observed  values  group  themselves  somewhat  asymmetrically 
about  the  average,  and  the  question  arises  whether  it  be  not 
well  to  take  a  best  value  which  will  minimize  some  other 
function  of  the  observed  measurements.  For  instance,  what 
value  will  minimize  the  sum  of  the  numerical  values  of 
the  errors'? 

^[  If  an  assumed  value  lie  between  two  observed  values, 
the  sum  of  the  numerical  values  of  its  divergences  from  the 
two  is  equal  to  their  numerical  divergence.  If  it  do  not 
lie  between  the  two,  this  sum  increases  as  the  observed  value 
recedes  from  the  observed  values,  for  it  is  equal  to  their 
numerical  difference,  plus  the  divergence  from  the  nearest 
one.  Let  the  observed  values  be  Xq,  x^,  X2^  .,.  Xn  arranged 
in  ascending  order  of  magnitude,  and  let 

Suppose  that  we  take  a  value  x  where 
^k-i  <  ^  <  ^fe  ;  2A;  <  ?i. 

The  sum  of  the  numerical  divergences  of  x  from  the  different 
observed  values  will  be 
{i\  +  r„)  +  2  (r^  +  r^-i)  +  3  (rg  +  r^.g) . . . 

The  value  Xj^  is  that  value  in  the  interval  in  question  which 
will  make  this  a  minimum.  In  the  same  way,  if  2k  >  n  the 
sum  w^ould  be  a  minimum  if  Xj^^^  =  x. 

If  X  lie  in  the  middle  interval,  the  sum  will  be  the  same 
throughout. 

Definition]  The  middle  term  in  order  of  onagnitude  of  an 
odd  number  of  terms  and  the  average  of  the  middle 


Problems. 

1.  An  angle  was  measured  by  a  theodolite  (mean  error  46-5")  to  be 
29°  13'  40",  and  by  a  transit  (mean  error  26.3")  to  be  29°  13'  24".  Find 
best  value  and  its  mean  error. 

2.  A  distance  was  measured  as  follows  : 

A)  with  steel  tape         741.17;     741-09;     741-22;     741-12;     741-01. 

B)  with  chain  741-2;       741.4;       741-0;       741-3;       741-1. 

Find  best  value  and  mean  error. 
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terms  of  an  even  numher  of  tcnns  is  called  the  median 
of  the  series. 
We  thus  have  a  theorem  due,  apparently,  to  Fechner.* 

fl  Theorem    10]    The  sum   of  the   numerical  values  of  the 

divergences  of  a  numher  from  a  given  series  of  numbers 

luill  he  a  minimum  if  the  numher  in  question  he  the 

median. 

Another  vahie  occasionally  used,  especially  in  statistical 

work,  is  the  mode,  which  is  the  point  of  accumulation  of  the 

given  set  of  measures. 

§  2.    The  Law  of  Error. 

We  saw  at  the  beginning  of  the  present  chapter  that  the 
title  of  the  present  section  is  essentially  a  misnomer.  There 
is  no  such  thing  in  Nature  as  a  law  of  error,  i.e.  a  fixed 
principle  according  to  which  accidental  errors  are  always 
distributed.  For  mathematical  purposes  we  desire  a  continuous 
function,  with  a  certain  number  of  continuous  derivatives, 
which  will  express  the  probability  for  an  error  of  given 
magnitude,  but  in  fact  there  is  a  certain  number  which 
represents  the  maximum  possible  numerical  error.  The  prob- 
ability for  an  error  very  close  to  this  wall  be  finite,  the 
probability  for  any  numerically  greater  error  is  rigorously 
zero.  Consequently  no  analytic  function  whose  argument 
runs  from  —  go  to  oo  can  fit  the  case  for  all  values  of  that 
argument. 

We  mean,  then,  by  the  laiu  of  error  a  mathematical  formula, 
reached  by  plausible  reasoning,  which  in  practice  will  give 
approximately  the  proportion  of  accidental  errors  in  any 
appropriate  interval.  To  make  the  law  as  plausible  as 
possible,  we  shall  start  from  the  broadest  assumptions  that 
will  give  what  we  want,  a  set  considerably  broader  than 
that  usually  taken,  but  the  acid  test  will  lie  in  the  question : 
do  observed  errors  conform  with  any  reasonable  degree  of 
closeness  to  the  law  which  has  been  deduced  1 

*  Fechner,  *Ueber  den  Ausgangswerth  der  kleinsten  Abweichungen ',  Sit- 
zungsberichte  der  K.  Akademie  der  Wissenschaft  zu  Leipzig,  vol.  xi,  1874,  p.  29. 

2686  I 
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AsRumption  4]  The  a  priori  probability  that  a  quantity  to  be 
observed  shall  have  a  value  in  the  infinitesimal  interval 
x±^dx,  X  being  in  a  certain  continuous  region  S,  will 
differ  by  an  infinitesimal  of  higher  order  from  f  (x)  dx, 
ivhere  f  is  a  function,  single-valued  and  analytic^ 
throughout  the  whole  reach  of  possible  values. 

Assumption  5]  The  probability  that  a  quantity  whose  true 
value  is  X  should  under  specified  conditions  be  observed 
to  have  after  the  removal  of  constant  errors  a  value 
in  the  infinitesimal  region  x±^dx,  where  x  is  a  point 
of  S,  will  differ  by  an  infinitesimal  of  higher  order 
from  $  (X,  x)  dx  where  the  function  $  and  its  partial 
derivatives  of  the  first  two  orders  are  continuous, 
and  where  its  value  is  independent  of  the  choice  of 
origin. 

Assumption  6]  If  the  infinitesimal  increment  dx  be  sufficiently 

small  the  probability  that  the  true  value  lies  in  the 

region  x±\dx  is  a  m^aximum  when  x  is  the  weighted 

average  of  the  observed  values. 

It  is  evident  that  these  assumptions  have  not  absolutely 

axiomatic  force,  yet  all  are  reasonably  plausible.     To  assume 

that  the  function  $  is  independent  of  the  origin  is  natural, 

for  we  expect  that  accidental  eiTors  will  arise  from  physical 

causes,  and  not  from  the  position  of  the  0  on  the  recording 

instrument.      As    for    the    last   assumption,   our   continuous 

function  must  have  a  maximum  somewhere  in  the  region,  and 

the  weighted  average  seems  as  likely  to  give  that  maximum 

as  any  other  number  we  could  naturally  think  of. 

Let  us  proceed  to  deduce  our  law  from  these  assumptions. 
Since  ^  is  independent  of  the  origin 

^{X  +  k,x  +  Jc)  =  ^(X,x). 

Putting  k  =  -X, 

^{X,x)  =  ^{0,x-X)  =  <p(x-X)  =  0'(^). 

It  appears,  therefore,  that  $  is  a  function  of  the  error  alone. 
This  fact,  which  is  sometimes  assumed  in  so  many  words,  has 
given  rise  to  criticism,  yet  it  follows  at  once  from  our  plausible 
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assumption    about    the    independence   of    the    origin.*      As 
matter  of  notation,  let  us  write 

Observed  values  x^,  x.^,  ...  a:^. 

Weights  Ih^Pz^'-'Pn- 

Weighted  average     x  =  ^p^x^/'Spi. 

True  value  X. 

Observed  errors         ^^  =  x^  —  X, 

Residual  errors  8^  =  x^  —  x, 

The  probability  that  these  observations  arose  from  a  quantity 
whose  true  value  is  X  is  given  by  Bayes'  formula  for  con- 
tinuous probability,  developed  in  Ch.  VI,  p.  98, 

fix)  0  (X^-X)  0  (X^-X)  ...<p(Xj^-x) 


f{x}<p{x^-x)  (j>  {x^-x) ...  <t>  (x^-'x)dx 


(7) 


The  integration  in  the  denominator  is  supposed  to  be 
extended  throughout  the  whole  of  the  region  S.  This  ex- 
pression will  be  a  maximum  with  the  logarithm  of  the 
numerator.     Equating  the  logarithmic  derivative  to  0, 

cZlog/     cZlog0(g,)      dlog^jQ  .  d]og<t>iU      ^  /«^ 

dx     ^        di,        +         di,        +•••+        ^i^^        -0(8) 

The  first  term  is  independent  of  the  observed  values. 
Suppose  that  we  are  so  lucky  as  to  get  exactly  the  ri^^ht 
value  each  time,  an  allowable  case, 

dx  d  I 

Now  the  function /is  independent  of  n^  hence 

—^  =  0,       /=  const.  (10) 

We  have  thus  removed  the  troublesome  a  priori  probabilities 
from  our  path. 

*  Beitrand,  loc.  cit.,  p.  177;  Poincard,  loc.  cit.,  p.  152. 

1  2 
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Going  back  to  tlio  general  Ccase,  assume  that  x  remains 
fixed,  while  j\,  x.^,  ...  x^^  lake  infinitesimal  increments. 

2hdii+p.>di,+  ,„+2^,di„=  0.  (12) 

One  of  these  equations  in  the  variables  d^^.d^.^...  holds 
M^henever  the  other  does,  hence 

Intcirratinir  once 

and  the  constant  //  is  seen  to  be  0  by  (8)  and  (0). 

Integrating  again 

<t>(ii)  =  re-'i'''-''.  (13) 

It  is  evident,  on  the  face  of  things,  that  this  formula  cannot 
be  strictly  correct  outside  of  certain  definite  limits,  for  it 
gives  a  finite  probability  for  an  obviously  impossible  error. 
It  is  also  clear  that  the  statement  that/  is  constant  could  not 
hold  from  infinity  to    infinity,  as   that   would   involve   tlie 


ridiculous  conclusion  that 


rdx  =  1.     We  note,  however, 


that:  1)  it  seems  plausible  to  assume  that  /  is  constant 
throughout  a  certain  region,  and  drops  to  zero  rapidly  outside  ; 
2)  expression  (13)  becomes  rapidly  very  small.  The  effect 
called  for  by  the  first  of  these  will  be  sensibly  produced  by 
assuming  (13)  to  hold  everywhere. 

Assumption    7]    For   the  jy'^'^pose   of  calculating   constants 
formula  (13)  may  he  assumed  universally  true. 

Assumption  8]  For  the  'puri^ose  of  calculating  constants,  ichen 

the   observations   are   all  of  equal   iveight^   the   mean 

n—  1 
value  of  8^  = M.  V.  ^^  may  he  replaced   by   the 

observed  average. 
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Replacing  this  latter  by  the  familiar  expression  \/2k^, 


Putting  ii  yipi  =  t. 


00 


2^    1^      ,-^^7._  ,  2r    ,       „_,, 


Vl'PiJo  {lp.)i  io  2/:^ 


r  Vtt  _  T  Vtt        1 

Dropping  the  subscript,  we  have  finally  : 

Gauss's  Exponential  Law  of  Error.*  The  'p'i'obahility,  under 
Asswniptions  1-6,  that  the  observed  measurement  of 
a  quantity  shall  have  an  accidental  error  in  the 
infinitesimal  region  i  +  ^d^  differs  by  an  infinitesi- 
mal of  higher  order  from  the  expression 

^    e-''^'di,  (14) 


where  the  mean  error  of  a  single  observation  is 

1 

hV2' 


(15) 


^f  It  is  evident  that  of  all  of  our  assumptions,  the  least 
plausible  is  6].  It  has  been  suggested  that  it  would  be  more 
natural,  not  to  assume  that  the  weighted  mean  gave  the 
greatest  possible  probability  to  the  observed  series,  but  that 
it  was  the  mean  of  all  possible  values,  in  view  of  the  ones 
that  had  been  observed.  This  can  be  carried  through,  but 
the  calculation  is  long.f 

The  form  of  the  probability  function  is  not  in  the  least 
surprising.     We  saw  in  discussing  Assumption  2]  that  if  we 

*  We  have  given  essentially  Gauss's  first  deduction,  which  appears  in  all 
text-books  on  Least  Squares.  The  original  is  in  his  '  Theoria  Motus  Corporum 
CJoelestium  '.  See  his  Collected  Works,  vol.  vii,  Hamburg,  1809,  p.  232.  But 
Gauss  assumes  explicitly  that  tf  is  a  function  of  the  error  alone,  and  that 
/  is  a  constant. 

+  Poincar^,  loc.  cit.,  p.  15G. 
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assumed  that  the  actual  accidental  error  committed  was  the 
surplus  of  positive  over  negative  elementary  errors,  or  vice 
versa,  that  this  assumption  would  be  fulfilled.  The  error 
would  be  the  discrepancy  in  a  series  of  trials  where  there 
was  an  even  chance  for  success  or  failure,  and  our  present 
formula  (14)  is  merely  a  restatement  of  formula  (10)  of 
Ch.  III.  This  method  of  reaching  the  probability  function 
is  beyond  a  perad venture  much  the  simplest,*  the  only  trouble 
is,  as  we  have  already  seen,  there  is  no  re?.l  reason  to  believe 
that  such  things  as  elementary  errors  really  exist  in  practice. 
The  fundamental  constant  k  that  appears  in  the  formula 
is  called  the  precision.  It  is  inversely  proportional  to  the 
mean  error  of  a  single  observation,  and  directly  proportional 
to  the  square  root  of  the  weight  that  should  be  attached  to 
that  observation  in  combining  it  with  others.  In  actual 
practice,  however,  especially  in  the  United  States,  it  is  more 
customary  to  give  the  probable  error  than  the  mean  error  or 
the  precision.     To  find  the  probable  error  ^;  we  put 

V  TTjo 

liCt  hi  =  t, 

kp  =  0-4769, 


^=0-4769('lV  (16) 


Again,  to  find  the  average  error,  since  positive  and  negative 
errors  are  equally  likely,  we  have 


Av.  error  = 


2 

k  Vtt^ 


0 

r«oo 


0 


te-'  dt, 


Av.  error  =  r  •  -j-  '  (17) 


k    V 


IT 


*  This  method  of  deduction  is  apparently  due  to  Hagen,  Orundzuge  der 
WahrscheirUichkeitsrechnung ,  Berlin,  1837. 
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Theorem  11]    If  a  set  of  measurements  folloiv  the  Laiv  of 

Gauss,  the  niectn  error ,  probable   error,  and  average 

error  are  constant  "niultiples  one  of  another. 

Suppose  that  we  have  two  independent  quantities  a.\  and  x^ 

of  such  a  nature  that  the  measurements  of  each   follow  the 

law  of  Gauss,  their  respective  precisions   being   L\  and  k.^. 

What  will  be  the  law  of  error  for  the  expression 

X/  =  a^X^  +  a^X^, 

The  probability  that  ^/  should  be  in  a  given  infinitesimal 
region  differs  by  an  infinitesimal  of  higher  order  from 

TT     J  J  TT     J  J 

This  integral  is  extended  over  so  much  of  the  ^j,  ^^  plane 
as  will  make  ^/  lie  in  the  infinitesimal  region  demanded. 
We  proceed  to  change  variables  in  this  integral,  putting 

i;  =  b,i,-^b,i,. 
We  will  choose  b^  and  b^  in  such  a  way  that  when 

is  expressed  in  terms  of  ^/  and  ^^\  there  will  be  no  product 
term.     This  gives 

k^b.j^a^-\-h^b^a^  =  0, 

7.  2  >  2    .    7,  2  <*  2  _  ^1    ^'2    (gl  )    +  (S2  ) 
'^l    gl    "•"  '^2   S2     —         /,  2^   2    I    7.  2^  2        » 
1        2      '       2        1 

Hence  our  integral  above  is 

TT{k^a^-\-k^a^^)]^^' 

r      -       ^'"  kk  -       ^I'V^^' 

J  -00  Vir  *^k^a^  +  k^a^ 
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Putting 

1     _<-       a./ 

7.  f'l  ~    7.  2     *      A.  2  ' 
/»/•               /i/j             ik/^ 

we  get 

Our  new  series  of  observations  follow  the  law  of  Gauss 
with  the  precision  /c/.  We  thus  reach,  by  mathematical 
induction, 

Theorem  \2\  If  x^^  X:^,  ,,.  x^  he  n  independent  quantities 
whose  measures  follow  the  law  of  Gauss  with  the  re- 
spective ptf'ecisioyis  /cj,  k2,  ...  /^„,  the  quantity 

^11  '  ^2    2  "T"  •  *  •    n    n 

obeys  the  same  law  with  the  p>recision  Ic  where 


a^      a  J  a  * 


/ '  2  —  O  "^  r  2  +  •  •  •  / — 2  '  (^  ^) 

Where   the   quantity   under   observation   is   the   weighted 
mean  we  have  p.         ^..2 

2,pj       2,Lj 

Theorem  13]  If  a  series  of  observations  obey  the  law  of  Gauss 
with  the  2jrecisions  k^,  k^,  ...  A;„  respectively,  their 
^ceighted  Tueau  %vill  obey  the  same  laiv  with  a  precision 

K,  where  j{  =  V^kf-.  (19) 

j 
A  residual  is  an  observation  which  is  linear  in  the  given 
system  of  observations.     Its  mean  value  and  its  true  value 
are  0  by  6].     If  e^  be  the  ith  residual, 

(SPj  -Pi)  l^i-Pj  <^j  -Ph^l:  ■  ■  ■ 


Sp 


(Skf-kl)Xi-lc/xj-h''x,,... 


Sk/ 
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For  the  precision  A:/  we  have 

1  1  1 
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j    ■ 
Theorem  14]  If  a  set  of  rueasurements.folloiu  the  law  of  Gauss 
with   the   res])ective   'precisions   /jj,   h^,    ...  /j„,   the  iih 
residual  follows  that  law  with  the  precision  lc{  ivhere 

-i-  =  -l.-_L.  (20) 

j    -^ 
Theorem  15]  The  precision  of  a  residual  of  n  ohset^ations  of 
precision  k  is  /    ^i    \% 

To  find  k'  replace  (n—\)  by  n  in  (6). 

For  convenient  reference,  let  us  make  a  table  of  the  results 
of(4),  (5),  (6),  (16),and(17). 

Table. 

Given  n  observations  of  weights  p^.p^,  ...  Pn' 
Let  the  corresponding  residuals  be  ej,  ^^,  ...  e^^. 


Precision 


Mean  error 


Probable  error 


Average  error 


Standard  ohs. 


k  = 


jn-1 
j  J 


^w-1 


Average  of  n. 


ln{n-l) 


Obs.  weight  p. 


^  n{n-l) 


/2€.2 

•0.6745  \/l^ 
^w-1 


0.798 


^  n— 1 


0.6 


0.798  V-i^, 


^(n-l)p^ 


0.6745\/-^^ 

^Cw-1) 


(n-l)p. 


0-798  \,  ^   ,, 
^  (n-l)p. 


Weighted  mean. 


K-^\- 


{n-l)^Pj 


Z^p.j 
3 


0.6745 


(w-l)2p. 


When  it  comes  to  making  these  various  summations,  there 
are  one  or  two  simple  expedients  which  will  materially  lighten 
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the  labour  of  a  computer  not  provided  with  an  adding 
machine.  The  latter  is  practically  indispensable  when  the 
mass  of  data  is  large.  Let  the  observed  values,  as  usual,  be 
a?!,  ajj,,  ...  x^.  Arrange  these  in  order  of  magnitude.  Let 
the  weighted  mean  be 

Choose  any  convenient  number  Xq,  either  the  numerically 
smallest,  or  the  median,  or  any  that  may  seem  helpful. 

€y  =  X^-X  —  {Xj-Xq)'\-(Xq-x\ 

^/  =  {Xj  -  xf  =  {xj  -  x^Y  +  2  (xj  -  x^)  (fl^o  -x)  +  (x^  -  x)\ 

Spj(Xj-X^) 

x:=x,+  '- j^ .  ^         (22) 

i- =.'- :^ {X,--X)\  (23) 

3  3 

Let  the  reader  prove  the  following  formula,  of  use  later, 
lip.  {Xj - X)  (yj -y)      ^pj  (xj - x^)  (yj - y^) 


^Pl  ^.Pj 


(^o-^)(2/o-^)- 


These  devices  are  particularly  useful  when  the  observations 
and  weights  are  integers,  but  the  weighted  mean  is  not. 

Tf  The  labour  of  calculation  may  be  further  reduced  as 
follows.  Let  us  first  recall  Tchebycheff's  principle  whereby 
an  average  will  probably  be  close  to  its  mean  value.  Then 
the  expression  ^p ,  |  ^ ,  | 


will  be  close  to  the  average  error,  and  the  expression 

/     f^^-^"     f_  0-6745       JPJ^^J^ 
V(7i~l)^>;/         0-798    2^PjV(n-l) 

-*  3 

will  be  close  to  the  probable  error.     Replacing  the  unknown 
g  's  by  the  residuals  we  get : 
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Approximate  formula  for  probable  error  of  weighted  mean 

0-845 -.^     . -•  (24) 

The  other  probable  errors  may  be  easily  calculated  from  this. 
The  approximate  precision  will  be  found  from  the  equation 

^=..77^M^.  (25) 

When  all  of  the  measurements  have  the  same  weight,  the 
precision  of  the  average  will  be  given  by 

i=l-"Jl2L.  (26) 

■^  nVn—l 

The  precision  of  a  single  observation  will  be  h^  where 

J  =  1-77  '  -^  '       ^  (27) 

and  the  precision  of  a  residual  will  be  k'  where 

l  =  l-77?l^-  (28) 

k  n 

The  way  to  test  in  practice  whether  a  series  of  observations 
conform  to  the  Gauss  law  is  as  follows.  Calculate  the  pre- 
cision of  a  residual  by  the  general  formula  in  the  table,  and 
(21),  or  by  approximate  formula  (28).  The  number  of 
observations  having  a  residual  numerically  not  above  e 
should  be  close  to 

nG(Jc€).  (29) 

As  a  quick  check,  note  whether  nearly  one-half  the  measures 
have  a  residual  not  greater  than  the  probable  error.  Here  is 
an  example.* 

Example]  In  the  years  1904  and  1905,  104  tests  were  made 
of  the  atomic  weight  of  iodine  in  the  Chemical  Labora- 

*  I  owe  this  example  to  Messrs.  William  Eldredge  and  Denning  Miller. 
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tory  of  Harvard  University.  Taking  as  a  first 
approximation  0^0=  126-980  and  multiplying  the 
residuals  by  1000  tve  get 


lO'(.r-Xo) 

13 
13 
11 
11 
10 
10 

9 

8 

8 

8 

7 

7 

7 

7 

6 

6 

6 

5 

5 

5 

5 

5 

4 

4 

3 

3 

206 


•0) 

lO'(x-i) 

108 

(x-x)^ 

103(.r-a-o> 

103  (x-x) 

10«(X-X)2 

11 

121 

3 

1 

1 

11 

121 

3 

1 

1 

9 

81 

2 

0 

0 

9 

81 

1 

1 

1 

8 

04 

0 

2 

4 

8 

64 

0 

2 

4 

7 

49 

0 

2 

4 

G 

36 

-1 

3 

9 

6 

36 

-1 

3 

9 

6 

36 

-1 

3 

9 

5 

25 

-3 

5 

25 

5 

25 

-3 

5 

25 

6 

25 

-3 

5 

25 

5 

25 

-3 

5 

25 

4 

16, 

-4 

6 

36 

4 

16 

-4 

6 

36 

4 

16 

-5 

7 

49 

3 

9 

-5 

7 

49 

3 

9 

-6 

8 

64 

3 

9 

-7 

9 

81 

3 

9 

-7 

9 

81 

3 

9 

-7 

9 

81 

2 

4 

-8 

10 

100 

2 

4 

-11 

13 

169 

1 

1 

-11 

13 

169 

1 

1 

-13 

15 

225 

134 

892 

-94 

150 

1,282 

XX  —  1 

26.982. 

1 
k' 

=  0.0092  [by  table]. 

1 

/7  ~ 

0.0096  [by  (28)]. 

Errors  less  than 

Obs 

erved. 

Calculated. 

0-001 

6 

6-3 

0.002 

11 

12.5 

3.003 

19 

18-8 

0.004 

22 

24.5 

0.005 

SO 

29 

0.006 

35 

38.4 

The  discrepancy  is  never  greater  than  4  per  cent.,  generally 
less.  On  the  other  hand,  the  residuals  are  not  symmetrically 
distributed  above  and  below. 


Problem. 

Work  out  a  similar  table. 
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f[§  3.    Doubtful  Observations. 

It  will  frequently  occur  that  when  a  largo  number  of 
measurements  have  been  taken  of  the  same  quantity,  there 
will  be  one  or  more  that  differ  very  sharply  from  all  of  the 
others.  These  observations  create  a  strong  suspicion  that  in 
their  cases  there  were  additional  causes  of  disturbance  at 
work  that  did  not  apply  in  the  case  of  the  other  measure- 
ments, and  that,  in  consequence,  these  exceptional  values 
should  be  rejected  in  making  a  calculation  of  the  probable 
error  or  precision.  This  question,  as  we  shall  see,  is  ex- 
ceedingly delicate,  but  it  is  insistent,  and  there  can  be  no 
doubt  that  many  observers  reject  some  of  their  observations 
by  pure  guess-work  or  common  sense. 

Bertrand  has  pointed  out  by  an  ingenious  analysis  *  that  if 
we  assume  all  of  our  measures  to  be  equally  trustworthy, 
and  reject  the  worst  ones,  we  shall  decrease  the  probable 
error  of  the  weighted  mean.     The  reasoning  is  as  follows  : 

Suppose  that  we  reject  those  observations  whose  errors  are 
so  large  numerically  that  the  chance  is  less  than  1  —p  of 
committinof  them.     We  have  as  a  limit  of  error 


I)  =  -J-       e    '  dt=  e  (kX).  (30) 

"^j  0 


VlT 


■^^^  ^1'  ^2'  ^3'  •••  im  ^^  ^^^^  errors  of  the  observations 
x^,  x.^,  ...  Xij^  w^hich  are  retained.  Assuming  all  measures  of 
equal  weight,  let  us  find  the  square  of  the  mean  error  of  our 
new  average  /r  4-  -r  -i-      -r 


This  will  not  be  1/2  mk^  as  the  reader  might  suppose,  for 
some  observations,  the  worst,  have  been  rejected,  but  will  be 

—  [mean  value  of  ^^1, 

when  we  mean  by  mean  value  of  |-,  the  mean  value  under 
the  present  circumstances  when  the  worst  measurements  have 
been  rejected. 

If  we  examine  statistically  into  the  probability  that  an 

*  Bertrand,  loc.  cit.,  p.  211. 
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error  shall  take  a  particular  value  numerically  less  than  A, 
that  probability  will  be  greater  than  it  was  before,  for  the 
numerator  is  the  same,  but  the  denominator  has  been  reduced 
as  the  errors  numerically  above  X  have  been  rejected.  We 
shall  have,  approximately,  m  =  np. 

The   mean  value  of  the  square  of  an  error   less   than    A 
numerically  will  now  be 

Integrating  by  parts,  and  remembering  (30),  we  have 

-2k  /   e^\^       P  _2k_  /e-^'^\ 
p^7r\y    2k'  A  "*"Jo  pVn\'2¥')'^^ 

■"2PL  pVw      J* 

Dividing  by  m,  or  rather  np,  we  have  for  the  square  of  the 
mean  error  of  the  new  mean 


2k\ 


This  is  less  than  the  square  of  the  mean  error  of  the  old 
mean  by  the  second  factor.  Unfortunately  we  have  no  one 
to  tell  us  which  observations  we  ought  certainly  to  reject. 

A  more  natural  proceeding  is  to  assume  that  in  a  few  cases 
there  has  been  at  work  a  disturbing  cause,  not  usually  present. 
The  first  writer  to  attack  the  question  from  this  point  of 
view  was  Benjamin  Peirce.*  He  set  himself  the  following 
general  problem.  Given  N  observations,  and  a  proposed 
number  to  be  rejected  ti,  what  is  the  numerical  limit  of  error 
that  makes  it  more  likely  that  the  n  observations  whose 
residuals  exceed  this  arose  from  a  disturbing  cause  than  from 
the  operation  of  the  natural  laws  at  work  in  the  other 
cases?  Peirce's  solution  is  highly  attractive.  He  frames  two 
hypotheses,  first  that  there  was  no  disturbing  cause,  second 
that  there  was  one.     For  the  first  hypothesis  he  calculates 

*  Peirce,  <  Criterion  for  the  Rejection  of  Doubtful  Observations ',  Astrono- 
mical Journal,  vol.  ii,  1862. 
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the  probability  that  n  observations  should  give  errors  as 
large  as  the  suspicious  ones,  and  that  the  other  observations 
should  give  just  the  errors  committed,  multiplying  the  two 
probabilities  together.  For  the  second  hypothesis  he  rejects 
these  observations  in  toto^  recalculates  his  precision,  and  the 
probability  of  making  just  the  other  errors.  This  he  multiplies 
by  the  a  priori  probability  that  n  observations  should  be 
disturbed  and  the  others  should  not. 

Peirce's  paper  aroused  a  good  deal  of  discussion.  It  was 
attacked  by  Airy  *  on  the  ground  that  no  judgement  should 
be  made  as  to  errors  a  posterwri,  but  to  this  Winlock  f 
truthfully  replied  that  the  whole  theory  of  eiTors  was  based 
on  just  this  ground.  Other  criticisms  have  been  made,  but 
the  real  fault  was  never  laid  bare  until  many  years  later 
when  Stewart  J  showed  the  absurdity  of  starting  with  a  totally 
unknown  a  priori  probability,  and  calculating  it  by  assuming 
that  it  took  its  maximum. 

A  simpler  rule  than  Peirce's  was  devised  by  Chauvenet.§ 
The  number  of  observations  being  iV",  if  the  probability  of  an 
error  numerically  greater  than  e  be  p^  then  iVjp  will  be  about 
the  number  of  errors  numerically  above  e.  If  we  set  this 
equal  to  J,  and  calculate  e,  we  are  unlikel}^  to  have  an  error 
as  large  as  that  numerically,  and  larger  errors  should  be 
rejected.  There  are  various  possible  objections  to  this,  one 
obvious  one  being  that  the  calculus  of  probability  deals  with 
ratios,  not  with  actual  numbers.  The  number  of  errors  of 
a  given  size  will  not  be  Np,  but  Np±(l^  where  d  is  an 
unknown  number,  small  compared  with  iV. 

A  totally-different  method  of  attack  was  devised  by  Stone. || 
His  idea  was  that  each  observer  erred  grossly  in  a  certain 
proportion  of  his  observations.  If  the  probability  that  the 
error  of  an  observation  should  be  as  large  as  a  certain  number 
be  less  than  the  probability  that  one  of  the  N  observation 

*  Airy,  *  Remarks  on  Peirce^s  Criterion  *,  Astronomical  Journal,  vol.  iv,  1856. 
t  Winlock,  'Airy's  OV>jections  to  Peirce's  Criterion',  ibid. 
X  Stewart,  *  Peirce's  Ciiterion  ',  Popular  Astronomy,  vol.  zzviii,  1920. 
§  Chauvenet,  Astronomy,  vol.  i,  1863,  p.  558. 

II  Stone,  '  Rejection  of  Discordant  Observations  ',  Monthly  Notices  R.  Astr.  Soc, 
vol.  xxviii,  1868,  xxxiv,  1874;  and  xxxv,  1875. 
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yhoiild  be  <afFected  by  the  observers  personal  idiosyncrasy, 
the  observation  should  be  rejected.  There  are  two  convincing 
objections  to  this  method  of  procedure.  One  is  that  we  have 
no  exact  knowledge  of  just  how  often  an  individual  will 
err  in  this  way.  The  other  is  that,  after  we  have  calculated 
the  limit  of  acceptable  observations  for  a  series  of  iV",  and 
find  that  perhaps  one  observation  should  be  rejected,  we 
might,  instead  of  rejecting  this  observation,  keep  on,  and 
observe  iV  more  times,  with  no  worse  result.  On  the  basis  of 
the  2iVthe  observation  which  was  suspicious  before,  may  now 
be  acceptable. 

Stone's  proposal  led  him  into  rather  an  unedifying  dispute 
with  Glaisher,  who  proposed  a  method  of  his  own.*  His 
idea  was  to  weight  the  various  observations,  deducing  their 
weights  by  a  method  of  successive  approximations.  Start 
in  the  usual  way,  and  calculate  the  precision.  Assuming  the 
Gaussian  law  of  error,  this  enables  us  to  calculate  the  re- 
spective probabilities  that  the  given  series  resulted  from 
a  true  value  equal  to  the  first,  the  second  ...  the  last  of  the 
given  values.  We  next  give  to  each  observation  a  weight 
proportional  to  the  square  of  the  corresponding  probability, 
find  the  new  weighted  mean  and  corresponding  precision,  and 
beo-in  over  again.  Glaisher  assumes  that  eventually  this 
process  will  approach  to  a  definite  limit.  It  might  well  be 
very  long.  Moreover,  it  seems  to  involve  a  certain  petitio 
2}rincipii.  For  the  weight  attached  to  an  observation  is 
proportional  to  the  square  of  the  probability  that  the  series 
arose  from  this  true  value  when  all  the  observations  are 
equally  trustworthy,  and  is  a  meaningless  coefficient  if  they 
be  otherwise. 

A  number  of  critics  have  maintained  that,  a  priori,  it  is 
quite  inadmissible  to  reject  any  one  of  a  set  of  observations 
when  all  are  carried  out  with  the  same  care.  Our  own  view 
is  that  such  caution  is  excessive.  It  is  all  a  question  in  the 
probability  of  causes.  Here  is  an  observation,  far  away  from 
the  mean  of  the  others.     It  may  have  arisen  from  the  same 

*  Glaisher,  '  On  the  Rejection  of  Discordant  Observations ',  ibid.,  xxxiii 
and  xxxiv. 
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causes  which  were  operative  in  the  other  cases,  there  may 
have  been  a  disturbing  cause.  Let  tt^  be  the  a  'p^^ori  prob- 
ability that  all  was  as  usual  when  this  observation  was 
taken,  ir.^^  the  a  lyriori  probability  that  there  was  a  disturbing 
element,  tending  to  favour  this  result.  We  do  not  know  the 
value  of  either  of  these,  but  may  safely  assume  that  the  first 
is  considerably  the  larger.  Let  2\  be  the  probability  that 
the  particular  measurement  would  be  made  in  the  natural 
course,  ^^3  ^^^^  ^^^  special  disturbing  element  might  produce 
it.  This  latter  we  do  not  know,  but  may  assume  it  large. 
p^  we  can  calculate.  To  compare  the  two  hypotheses  by 
Bayes'  principle  we  must  look  at  the  fraction 

li  Pi  be  infinitesimally  small,  in  spite  of  the  likelihood  that 
TTj  is  considerably  larger  than  tt^,  there  is  much  reason  to 
suspect  that  the  fraction  is  small,  and  the  observation  should 
be  rejected.  It  is  the  same  principle  we  discussed  in  Ch.  VI, 
p.  95,  in  discussing  the  lawsuit  over  a  roulette  wheel. 

The  delicate  point  is  the  probability  p^.  The  safest  plan 
is  to  calculate  l—Pi^  the  probability  that  no  one  of  the  n 
observations  should  vary  so  widely  as  the  most  suspicious 
observation  made.  If  this  be  as  laro:e  as  a  fixed  lar^je 
probability  P,  there  will  be  strong  grounds  for  the  belief  that 
the  worst  observation  did  not  arise  in  the  natural  course,  and 
that  it  should,  consequently,  be  rejected.  Analytically,  the 
probability  that  no  error  will  be  numerically  above  re,  where 
e  is  the  probable  error,  is 

[0  (0-4769)]^  =  p. 

Given  p  —  0-99,         n  —  30. 

We  get  r  =  5. 

A  residual  5  times  the  probable  error  is,  here,  suspicious. 
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CHAPTER  VIII 

ERRORS  IN  MANY  VARIABLES 

§  1.    The  Law  of  Error.* 

In  all  of  the  work  done  so  far,  we  have  tacitly  assumed 
that  we  were  studying  errors  in  the  observation  of  a  single 
variable  quantity.  There  are,  however,  cases  where  it  is 
interesting  and  important  to  observe  groups  of  quantities, 
and  the  corresponding  groups  of  errors,  in  other  words,  error 
in  measurements  involving  many  independent  variables.  Our 
present  task  is  to  establish  a  plausible  rule  for  the  distribution 
of  accidental  errors  in  such  cases. 

We  must  say,  by  way  of  preface,  a  word  or  two  on  the 
matter  of  notation.  The  strictly  scientific  method  would  be 
to  use  a  system  of  double  subscripts,  the  one  to  indicate  the 
quantity,  the  other  the  observation.  The  resulting  formulae 
would  be  compact,  but  would  lack  clearness.  We  assume, 
therefore,  that  we  have  n  sets  of  measurements  of  ni  inde- 
pendent variables 

(^13  2/1  >  ^1'  •••)  ('^2>  2/2'  -^2'  •••}  •'•  V'^n*  2/n>  ^m  •'•)' 

The  true  values  shall  he  X^Y^Z, ...»     The  true  errors  shall  be 

(il>  Vl}  ^i>  •••)  ii'^y  ^2'  ^2>  •••)  •••  iin^  Vn^  Cn^  •")' 

Assumption  1]  The  mean  value  of  an  individual  accidental 
error  is  zero. 
This  is  certainly  plausible,  for  a  contrary  assumption  would 

*  The  present  section,  in  so  far  as  it  deals  with  any  number  of  variables, 
is  taken  direct  from  an  article  by  tlie  Author,  *  The  Gaussian  Law  of 
Error  for  Any  Number  of  Variables',  Transactions  American  Math.  Soc,  1923. 
Apparently  the  only  other  treatment  is  that  of  Von  Mises,  *  Fundamental- 
satze  der  Wahrscheinlichkeitsrechnung',  3fa/;i.  Zeitschr if t,  vol.  iv,  1919,  and 
*  Grundlagen  der  W.',  ibid.,  vol.  v,  1920.  See  also  Dodd,  'Functions  of 
Measurements  ',  Sartryck  ur  Skandinavisk  Aktuarietidskrift,  Upsala,  1922. 
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involve  a   tendency   towards   a  positive   or  negative   error, 
which  should  be  classed  with  the  constant  errors. 

Postulate  1]  The  Postulates  l]-5]  for  the  best  value  of  a 
single  observed  qvxintity  hold  for  each  quantity  of  the 
group. 

We  have  for  each  of  our  quantities,  exactly  the  assumptions 
for  one  quantity  whi<5h  were  set  up  in  the  last  chapter.  We 
may  thus  write  our  best  values  in  the  form : 

^-    2p,    '     y-    Sq,    '     '-    Sr,  ('> 

Theorem  1]  When  a  set  of  observations  are  made  under  the 
conditions  of  Assumptions  1-3,  the  best  value  for  each 
quantity  is  a  weighted  mean. 

Theorem  2]  When  all  observations  of  one  quantity  are  equally 
trustworthy,  the  best  value  is  their  average. 

We  shall,  in  future,  use  the  words  weighted  mean  in  place 
of  best  value,  the  coefficients  being  the  weights. 

Theorem  3]  If  it  be  possible  to  express  each  measurement  as 
an  average  of  a  certain  number  of  standard  observa- 
tions, then  the  weights  in  the  weighted  7)iean  are 
proportional  to  the  numbers  of  standard  observations 
in  each  case. 

Theorem  4]  If  the  mean  error  for  the  observation  x^  be  l/k^  >/2, 
the  mean  error  for  the  lueighted  mean  will  be 


(^  Vi  V^ 


2V 

2 


{^Pi) 


(2) 


Theorem  5]  The  lueights  in  the  iveighted  mean  are  inversely 
proportional  to  the  squares  of  the  corresponding  mean 
errors. 

Let  the  residuals  corresponding  to  the  true  errors  i,  rj,  ^  .., 
be  8,  €.... 

K  2 
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Assumption  2]  When  the  number  of  observations  is  large,  the 
mean  value  of  each  of  the  expressions  such  as 

^Pi^,      ^lill,      ^Pihli  (s) 

may  be  replaced  by  its  obt>erved  value. 
Theorem  6]  The  mean  errors  of  tJie  iveighted  means  x,y,z  ,,,  are 

\J2K''       \/{n-l)Spi'    \l2L''       V  (n--l)^pi' 
respectively. 
Theorem  7]  The  mean  errors  of  the  individual  observations 

■  2" 


/J_  ^      /JPkSjl  .        Ij_  ^       /  2p, 
V  2/cf       V  (ii-  l)pi'    V  21/       V  (n-  1 


)Pk 


Theorem  8]  When  x^  is  the  average  of  p^  standard  observations, 
the  mean  error  of  one  of  these  is 


1}      jsj;;^;:'. 


We  must  now  try  to  develop  a  law  of  error  for  our  groups 
of  observations.  For  the  sake  of  simplicity,  we  shall  assume 
all  groups  are  equally  trustworthy,  so  that  all  are  weighted 
alike. 

Assumption  3]  The  a  priori  probability  that  a  group  of 
quantities  to  be  measured  should  take  values  in  the 
infinitesimal  regioro 

X±idX;  Y±idY]Z±idZ..., 

where  the  point  X,  F,  Z, ...  lies  in  a  continuous  m 
dimensional  region  S,  ivill  differ  by  an  infinitesimal 
of  higher  order  from 

/(Z,  Y,Z,  ...)dXdYdZ... 

cohere  the  function  f  is  continuous,  vjith  continuous 
first  derivatives  in  S. 
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Assumption  4]  The  probability  that  a  group  of  quantities 
whose  true  values  are  X,  F,  ^, ...  in  S,  should  be 
observed,  after  the  removal  of  constant  errors,  to  have 
values  in  the  infinitesimal  region  of  S 

x±idx,  y±idy,  z±idz.,., 

will  differ  by  an  infinitesimal  of  higher  order  from 

$  (X,  Y,  Z,  ...  X,  y,  z  ,..)dxdy  dz  ... 

where  $  is  a  function  continuous  in  cdl  of  its  argu- 
ments, and  with  continuous  first  and  second  partial 
derivatives,  and  is  independent  of  the  origin. 

Assumption  5]  If  the  infinitesimal  increments  be  sufficiently 
small,  the  probability  that  the  true  values  lie  in  the 
infinitesimal  region 

x±^dX,  y±\dY,  ~z±\dZ ... 

is  greater  than  that  they  lie  in  any  other  such  region. 

We  have  now  a  sufficient  number  of  assumptions  to  determine 
the  form  of  our  function.  The  fact  that  /  is  independent 
of  the  origin,  enables  us  to  write 

$  (X,  Y,  Z,  ...  X,  y,  z,  ...)  =  (^  (I,  r],  i,  ...). 

Let  us  further  write 

The  probability  that  the  observations  were  made  on  a  group 
with  the  true  values  X,  Y,  Z, ...  will  be 

/(Z,  Y,  Z, ...)  <^,  <^, ...  4>^dXdYdZ... 


f(X,  Y,Z,...)(f>,(t>^...  (fy^dXdYdZ... 


(4) 


This  will  be  a  maximum  with  the  logarithm  of  the  numera- 
tor.    Taking  the  partial  derivatives 


1    i)/  ^^^og^^^^^^Mog^^^^  ^^^ 


/c)Fi  ^T/i  '"  <iT}n 
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Since/  is  independent  of  the  observations,  in  the  particular 
case  where  each  set  is  exactly  right 

1   ^/  ^\og4> 

1     <if  c)loff(/)  ,    , 

f  =  const. 

1   ^(t).        1    J)</).,                 1    Tid)^ 
—  H -— i-i  +       H ^  =  0 

1  ^  +  i  ^Jb  +       +1^  =  0.  (7) 

*  <  • 

These  equations  exist  when 

x  =  X;    y^Y\z=  Z,  &c. 

9i'7i  +  9'2'72+--- +9n'7n  =  0, 

Now  let  ajj,  a?2J  •••  ^n  ^^-ke  infinitesimal  increments,  subject 
to  (1).     We  have 


;[^'^]"^'*4[s'-?:]"^--4;[i'4-:]^<-=°- 


i^l^ll+^2^^2+---+i'n^^«=  ^• 
Since  those  which  precede  the  last  must  hold  whenever 
the  last  does, 

Pi 


-  ^  [^ '^1  =  a  (^=  1,2,  ....), 
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1    i)  r  1  ^(^,.-|       1    ^  r  1  'dcjyi 


PiH 


^^^-r-Q,v^c..-)^  (8) 

Here  yjr"^  is  a  quadratic  function,  necessarily  homogeneous, 
for  we  have  already  seen  that  the  partial  derivatives  vanish 
when  the  arguments  are  all  zero.  Moreover,  its  discriminant 
is  not  zero,  for  if  it  were,  the  partial  derivatives  would  be 
linearly  dependent,  and  vanish  for  an  infinite  number  of  real 
sets  of  the  variables,  and  this  is  in  direct  conflict  with  our 
assumption  that  the  maximum  arises  only  from  taking  all  the 
values  equal  to  zero.  Moreover,  since  this  is  known  to  be 
a  maximum,  this  form  must  be  definite  and  positive,  since 
otherwise  the  maximum  would  be  attained  at  infinity : 

The  homogeneous  quadratic  for m  -v/r^  is  definite  a^ml  positive 
with  a  non-vanishing  discrimiaant. 

We  next  notice  that  all  the  work  done  so  far  has  been  in 
a  certain  region  S.  We  have  found  the  probability  that  an 
observation  in  S  should  lie  in  a  certain  infinitesimal  sub- 
region.  What  is  the  region  >Sf?  It  could  not  be  the  whole 
of  space,  as  the  assumption  that  /  is  everywhere  a  constant 
will  lead  to  the  absurd  conclusion 


,  [     f{X7Z...)dXdYdZ...  =  1. 

J  -30 


On  further  consideration  two  more  facts  appear.  First,  it 
seems  plausible  to  assume  that  /  is  constant  throughout 
a  certain  region,  and  drops  away  very  rapidly  outside  of  it. 
Second,  the  expression  (8)  is  extremely  small  outside  of  a  very 
restricted  part  of  space.  As  this  will  produce  a  result  like 
that  produced  by  the  disappearance  of  /,  the  error  in  calcu- 
lating the  constants  will  be  small  if  we  allow  S  to  extend 
throughout  all  space. 
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Assumption  6]  For  the  sake  of  calculating  constants,  formula 
(8)  ^nay  he  assumed  true  everywhere. 
We  have  also  at  our  disposal  Assumption  2],  and  this  with 
G]  will  be  enough  to  solve  the  problem.  As,  however,  the 
solution  is  rather  long,  we  shall  begin  with  the  case  of  two 
z  varistbles,  and  assume  that  all  observations  are  equally 
trustworthy.  Suppose  that  the  law  of  error  is  expressed 
by  the  equation 

^  =  Re-^''^'-^^'^'^-'''^"K  (9) 

The  curves  a^'^  +  2h^-q +  cif  =  const. 

are  curves  of  like  probability,  and  cannot  run  off  to  infinity 
by  Assumption  2].     Hence  these  curves  must  be  ellipses  and 

IP'  —  ac  <  0. 

By  a  rotation  of  the  ^,   rj  plane  about  the  origin,  these 
ellipses  may  be  written 

a'p  +  c'rj'^  =  const. 

The  theory  of  invariants  for  conies  shows  us  that 

a  +  0  =  a'  +  c' ;  b'  —  ac  =  —  a'c\  (1 0) 

Since  the  sum  of  all  probabilities  is  unity 


r»oo  f»oo 


1  =B. 


di 


—  GO 

r»oo 


•(a^2^2b£7;  +  cV0^^, 


e      '    di 


J  —00 


_  Via/c')  _  V{ac--h'^) 


,-c''7'^7-.' 


clr) , 


R  = 

We  find,  by  Assumption  2], 


V(ac-h'') 
^/jac-Jj^) 

TT 

V(ac-b'^) 
n 


nOD 


i'di 


Tj^dr] 


^-{cte+^Hv-^cri^)  ^^  ^Ml_, 


—  GO 


'00 


—00         J  —00  n  —  1 
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To  solve  the  first  of  these,  we  write 

ac—l)^.^     /  b 


—  n2 


i'  =  i, 

,         h    .         /- 


Changing  variables,  we  have 


ttVc 


n'2    ,     / 
'     CtT) 


_  V{aG-b^) 
Vtt.  Vc 


(ac^&2) 


re--T-^  J|, 
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Similarly 


IT 


2(ac-62) 


r)^clrj 


^-{a^^  +  2hirj  +  cv')^l^^ 


a 


2{ac-U') 


There  remains 


IT 


Putting,  as  before, 


ml"   ^e-^"f'+''^'+'''^d,. 


i'^h 


'^^J+V-Cr,, 


Since 


/e-"^  dv'=0. 
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We  have 


c         ^  SS/ 
2{ac-h'^)      n-i' 

a        _  Su' 


2(ac-b^}      n-\' 
-b       _  SSi^i 


a  = 


2(ac-62)      71- 1  ' 
c  —  * 


(11) 


R  = 


71—  1 


^  The  problem  of  finding  the  actual  -coefficients  in  the 
general  case*  does  not  seem  to  lend  itself  to  an  analogous 
method.  We  therefore  take  up  the  question  from  the  start ; 
the  theoretical  importance  of  the  problem  seems  sufficient  to 
warrant  the  labour  involved.  We  shall  begin  by  a  change  of 
notation  in  order  to  use  forms  which  are  frequent  in  the 
study  of  linear  transformations.     Let  us  write 

SO  that  (8)  becomes 


a..x 
*  J 


—      >    a..x.x. 


<t>  =  Re     '''  =  '  aij  =  aji.  (12) 

*  Cf.  Greiner,  Zeitachrift  fur  Mathematik  und  Physik,  vol.  Ivii,  p.  226 ;  and 
Pearson,  Philosophical  Transactions,  vol.  clxxxvii,  p.  299  if. 
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We  must  first  consider  the  discriminant 

Since  the  quadratic  form  is  definite,  this  discriminant  is 
not  0,  and  we  may  find  such  a  linear  transformation 

k  =  ?)i 

t=  2  (^ik^k^     kiyl^O,  (13) 


«i 


k=l 


i,  j,  I;  I  =  m 

that  ^aijXiXj=      2      ^ijCik^^jl^k  ^i 


r  =  in 


2  h,x;\  h,  #  0. 


r  =  1 


Hence  ^   o.ij^ih<^ji  =  ^^    ^  ^  ^'  (1*) 


i, }  -  m 


^A'"Ki=\Cij\^'\^'ij\' 

The  inverse  of  the  substitution  contragredient  to  (1 3)  is 


I  —  ill 


'^k  =  2  ('ik'^^i 


1=1 


2, 7  ^  m 


V=  2  Cikf^jk'^i^j' 

i,j=  1 

The  hyperquadric  in  (ti—  1)  dimensional  space 

2  ^ij^i^j  =  ^' 

has  the  tangential  equation 
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In  the  new  variables  the  point  equation 

k  =  lU 

2  h^k'  =  0 


k  =  \ 


corresponds  to  the  tangential  equation 


h  =  'III 


'2 


2^  =  0. 


/•  =  i 


h 


/.•  —  111 


hj^  «i 


i  =  \   ^  i,j=i 

k-  =  ni 


(15) 


nm 


As  a  second  step  in  our  development,  let  us  consider  the 
residuals 

^11' °12»  .••  <^i„i;    ^21J  ^22'  •••  ^27n5   .••  ;    ^nl5^«25...^ 

We  write  lor  brevity 

k  =  ',.. 

2  hihj 


Pa 


k=l 


'J  71-1 

We  get  from  Assumption  2] 


(16) 


Pij  =  -R 


pOO 


-00  J 


,  — Sa.x.x 


cc^ X' e    ^ ^ij  I  J dx^,,.dXjn,  (17) 


Let  us  change  variables,  remembering  that  the  Jacobian 

J_CO  J-O0,.,^j 

Now,  when  k  =^  I, 

poo 

J  — 00 

Hence 


e     *^^*  dxj/ 


.00       I'  =  ?rt 


/^  -  h^t^  .7^  ' 


ic^'e     *'*   dx{  =  0. 


2  (^ik^^ik^k^^    V'''''"dxy'.,,dx^\ 


k=  1 
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HI 


We  know,  further,  that 


e-^'^'^'^dx/ 


V  TT 

767.' 


X 


2  ih)^ 


Hence       Pij  -  \  c^j  \ 


R 


TT-' 


k:  =  hi 


C-iJ\ 


Hence,  by  (15) 


^  V  o  I  .. 


TT 


Rn^  A 


i^v 


^./ . 


2|rt 


y 


Furthermore, 


1  =  R 

Rw^  \c 


—  "^a.  X  X 


V 


RtA 


V 


Dividing  out 


Vi} 


—  _4iL 

2  \a 


V 


Here  the  quantities  p^-  are  known.     We  wish  to  solve  for 


the  unknown  a.-'s.     We  first  write 

V 


%= 


*.? '. 


Since  the  process  of  interchanging  each  element  of  a  non- 
vanishing  determinant  with  its  cofactor  is  an  involutory  one, 
except  for  multiplication  by  a  positive  or  negative  power  of 
the  determinant,  we  shall  evidently  have 
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W— 1 

1  > 


2  !»••  I 
We  thus  reach  our  final  equations 


.— 2a.  xa;. 


<^  =  Re    ^y  ^"j 


1  t)  log  I  w,. .  I  ,     , 


§  2.    The  Error  Ellipse. 

Let  us  return  to  a  more  careful  study  of  the  case  where 
there  are  but  two  quantities  in  a  group.     The  curves 

ai^  +  2bir}-{-C7}^=a'i'^-\-c'rf'^  =  H  (19) 

are  ellipses,  and  are  called  error  ellipses  or  ellipses  of  equal 
probability.  The  meaning  of  the  designation  is  easily  seen. 
If  we  take  a  small  band  on  either  side  of  such  an  ellipse,  the 
probability  that  the  point  representing  a  pair  of  values  should 
lie  in  a  small  region  of  this  band  is  independent  of  the  position 
of  the  region  with  regard  to  the  curve ;  the  points  should  be 
somewhat  uniformly  distributed  throughout  the  band.  To 
study  this  ellipse,  we  must  return  to  our  equations  of  trans- 
formation b'-ac  =  -a'c\ 

the  relation  between  the  two  sets  of  variables  is 

i^  =  i  cos  ^  4-  t;  sin  0, 
Tf'  =  —^  sin  Oi-ij  cos  Of 
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26 


tan  2  ^  =  — 


c  —  a 


c'  ^  ^^  ±  6  CSC  2  ^,  (20) 


a    =  — ;r-   +  6  CSC  2  ^, 

the  semi-axes  of  the  ellipse  are 


Its  area  is  /  /  /  —  -77 lot  - 

The  probability  of  being  between  adjacent  ellipses  is 

Kn 


V(ac-b^) 
To  find  K^  we  have 

Kn 


e-"dH. 


V{ac-b^)jo 

V(ac-b^) 


TT 

The  probability  of  being  in  a  small  band  is 

e-"dH. 
The  probability  of  falling  outside  an  ellipse  i^j  is 


f 

J  Hi 


About  one-half  of  the  points  should  be  without  the  ellipse 


p-Hl  —  1 

^  —    2» 


jF/j  =  0-6935. 
This  is  called  the  '  probable  ellipse '.     Its  area  is 

0-69357r 
V(ac-6-)* 

In  judging  the  performance  of  marksmen,  it  has  been 
suggested  that  they  should  be  gi'aded  according  to  the  small- 
ness  of  their  error  ellipses,  i.e.  the  better  marksman  is  the 
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one  for  whom  ac  —  h^  has  the  larger  value.     In  extreme  cases 
this  method  may  work  badly.* 

As  an  example  of  how  such  material  may  be  handled,  we 
take  a  case  that  has  perhaps  more  historical  than  mathematical 
interest,  the  'Big  Bertha'  shots  that  fell  on  Paris  in  1918. 
The  number  of  shots  is  not  very  large,  we  take  100,  which  is 
nearly  the  total,  and  they  were  not  all  fired  from  the  same 


Fig.  4. 


spot.  The  major  axis  of  the  probable  ellipse,  as  shown  in 
Fig.  4,  does  point  in  a  general  way  in  the  direction  whence 
the  firing  came.  This,  of  course,  we  should  expect  under  any 
circumstances.  Errors  in  range  are  likely  to  show  more 
variation  than  errors  in  direction,  and  a  set  of  shots  which 
took  a  circular  distribution  on  a  point-blank  target  would 
take  an  elliptical  one  on  a  target  not  perpendicular  to  the 

*  Cf.  Bertrand,  loc.  cit.,  pp.  236  fif. 
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central  curve.     The  details  of  the  calculation  used  in  finding 
the  probable  ellipse  are 

n  =  100, 


=^  =  3.270,     ^-^'  =  1.180,    -4t^=  2,030, 
99  '99  '99 

203  ,  -118  327 


a  = 


1,049,140'         ~  1,049,140'  1,049,140' 

a  =  0-000193,       b  =  -0-000113,    c  =  0-000312, 
tan  2^  =  +1-9,     0  =  31°, 
a'  =  0-000124,     c'  =  0-000380. 

The  axes  of  the  probable  ellipse  are 

a  =75,      /3  =  43. 

In  the  figure  there  are  only  45  points  within  the  probable 
ellipse,  but  had  it  been  just  a  little  larger,  fully  one-half 
would  have  been  therein.  The  centre  is  close  to  the  Louvre, 
and  the  major  axis  passes  close  to  the  Gare  de  I'Est. 

§  3.    The  Correlation  Coefficient. 

Until  recently,  the  only  interest  attached  to  errors  in  two 
vai'iables  was  the  ballistic  one,  but  now  new  applications 
have  arisen  in  connexion  with  statistics.  A  fundamental 
question  in  many  sorts  of  statistical  work,  especially  in 
socii  logical  and  biological  sciences,  is  whether  two  character- 
istics which  are  noted  in  a  large  number  of  individuals  are 
connected  in  some  way,  or  vary  independently.  It  is  evident 
that  the  assimilation  of  such  measurements  and  variations  to 
accidental  errors  of  observation  is  very  crude.  The  errors 
here  are  committed  by  Nature  as  she  varies  one  way  or  the 
other  from  the  average.  Nevertheless,  in  a  good  many  cases, 
our  Assumptions  l]  to  6]  do  fit  her  methods  of  operating 
with  considerable  closeness. 

Suppose,  then,  that  in  the  case  of  a  number  of  individuals, 
we  measure  the  same  pair  of  characteristics,  and  plot  the 

2686  L 
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pairs  of  measurements  as  points  in  a  plane.  In  order  to 
bring  our  notation  into  conformity  with  that  used  by  statis- 
ticians, we  shall  call  the  residuals  x^y^,  x^y^y  ...  x^yn'  ^^ 
the  two  characteristics  were  so  connected  that  the  one  in- 
creased above  the  mean  proportionately  to  the  increase  of  the 
other,  the  points  plotted  would  lie  on  a  line  of  positive  slope ; 
if  the  increase  of  one  were  proportional  to  the  decrease  of 
the  other,  the  line  would  have  a  negative  slope.  If  the 
characteristics  were  completely  independent  of  one  another, 
the  mean  value  of  their  product  would  be  0,  and  the  axes  of 
the  ellipse  would  be  the  axes  of  x  and  y. 

We  write  in  the  usual  statistical  notation 


The  number 


^ 


^iVi 


r  =  ?Mi__  =  _!!l  (22) 

is  called  the  correlation  coefficient.     We  suppose  n  so  large 
that  we  may  safely  put 

n—  1  _ 
n 

Under  these  circumstances,  we  have  from  (11)  and  (20), 

cr  /  1 


2cr/<(l-r^)       2o-/(l-r2)' 
b  = 


y 
r 


2cr^cr„(l-r^)' 


_  1 

tan  26=       ,     '^  .f, 


esc  2^= 


2^'cr^^,/ 


a 
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_,       .r/  +  a/  +  ((<r,^  +  a/f  -  4  <r/<r/  ( 1  -  r'))^ 
4o-/<7/(l-r=^) 

When  r  is  close  to  1  or  to  —  1  we  say  that  we  have 
a  strong  positive  or  negative  correlation.  The  difference 
between  a'  and  c'  will  be  large,  and  the  ratio  of  the  axes  will 
be  close  to  0  or  oo.  The  ellipse  will  be  excessively  flat,  and 
the  two  sets  of  residuals  will  tend  to  vary  proportionately. 
On  the  other  hand,  when  r  is  close  to  0  we  say  the  correlation 
is  weak,  b  will  be  very  small.  Then  either  6  will  be  very 
small,  and  the  probability  function  will  be  close  to 

which  is  characteristic  of  independent  variation,  or  else  a-^  is 
nearly  equal  to  cr^,  a'  is  nearly  equal  to  c',  and  we  have 
nearly  a  circulai*  distribution  which  would  also  give  6 
close  to  0. 

fl  It  is  fair  to  say  that  the  usual  method  of  arriving  at  the 
correlation  coefficient  is  quite  diflferent  from  this  ;  for  the  sake 
of  completeness  we  sketch  the  customary  proceeding.* 

We  start,  as  before,  with  the  centre  of  gravity  of  the  given 
points  as  origin.  There  may  be  several  aj's  corresponding  to 
each  y.     Thus  we  might  have  on  one   horizontal  line   the 

points  {'^uyi){'Oi,yi)...(Xi„,yi), 

whose  centre  of  gravity  would  be  the  point 
-  _   1  . 

If  now  X  and  y  varied  proportionately  to  one  another,  all 
of  the  points  (^i2/t)  would  be  collinear.     When  they  are  not, 

*  Cf.  Yule,  *  On  the  Significance  of  Bravais'  Formula ',  Proceedings  of  the 
Royal  Society,  vol.  Ix,  1896. 
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Jet  US  find  what  line  does  make  the  best  graph  of  x  as 
a  function  of  y.  We  shall  call  this  a  liifte  of  regression  of  x 
on  y.  We  mean  by  the  best,  that  which  will  minimize  the 
sum  of  the  squares  of  the  weighted  divergences  of  the  values 
X  from  the  corresponding  values  of  the  function.     Calling 

we  must  have 

2  w*  [^i  -  {^Vi  +  «)]^  =  Min. 

i  =  1 

Dilferentiating  partially  to  a  and  A;, 

i 

i 
^m^Xi  =  ^m^yi  =  :SXij  =  0. 

Here  the  third  summation  covers  all  the  abscissae.     It  must 
not  be  forgotten  that  the  origin  is  at  the  centre  of  gravity, 

hence  a  =  0,  /.  =  ?^^  =^  =  r ^^ 

The  line  of  regression  of  a;  on  2/  is 

x=^r  —  y. 
In  the  same  way  the  lino  of  regression  of  2/  on  a;  is 

7/  =  r  -^  X. 

The  tangent  of  the  included  angle  is 

r2-l 
tan  0  =  — 


When   r  is  close  to  1  or   —1,  the  angle  between  the  two 
lines  of  regression  is  close  to  0  or  tt.     Moreover,  the  sum  of 
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the  squares  of  the  areas  formed  by  the  various  pairs  of  points 
with  the  Qrigin  is 

When  r  is  close  to  1  or  —  1 ,  this  will  be  close  to  0,  so  that 
all  of  the  points  will  lie  nearly  on  a  line  through  the  centre 
of  gravity.  On  the  other  hand,  when  r  is  close  to  0,  the  two 
lines  of  regression  are  close  to  the  axes.  For  each  y,  the 
average  x  is  about  0,  and  the  characteristics  are  practically 
independent  of  one  another. 

The  great  trouble  at  present  with  the  theory  of  correlation 
seems  to  be  that  there  is  no  general  agreement  as  to  how 
large  r^  must  be  in  order  that  we  may  safely  conclude  tliat 
there  is  a  real  cormexion  between  the  two  sets  of  phenomena. 

^  Besides  the  correlation  coefficient  there  is  another  number, 
called  the  correlation  ratio,  which  the  statisticians  sometimes 
employ.  We  begin  with  a  new  system  of  coordinates  which 
is  independent  of  the  unit  of  measure, 

x/  =  ^      7//-  ^'. 

Then  Z<2  ^  ^yp  ^  ^^ 

Let  us  group  these  according  to  the  2/'s  as  before.  We 
shall  have  on  a  horizontal  line 

(^i I  Vi) {^i2  Vi) ' ' '  i^im, yi)>  ^"^i  =  ^. 

i 

The  dispersion  of  the  abscissas  is 

VSixtj-x/f. 

The  total  dispersion  is 

ij  i 
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This  is  called  the  correlation  ratio  of  x'  on  y\  It  is  equal 
to  unity  when,  and  only  when,  2/  is  a  single  valued  function 
oi  X,     If  the  points  x{  y^  all  lie  on  a  straight  line  we  have 

x'  =  ly', 
7  _  ^'  _      //^K»/')\  _  , 

The  corresponding  points  xy  must  lie  in  the  line 


which  must  be  the  regression  of  x  on  y,  hence 


hxx'y'  =  r. 


CHAPTER  IX 

INDIRECT  OBSERVATIONS 

§  1.    Least  Square  Method  for  Combining  Indirect 

Observations. 

It  frequently  happens  in  physical  measurements  that  we 
ai'e  not  able  to  make  a  direct  examination  of  the  quantities 
which  interest  us,  but  must  deduce  their  values  from  the 
observations  of  certain  functions  of  them.  We  are  faced  in 
such  cases  with  the  problem  of  combining  the  observations 
in  such  a  fashion  as  will  best  help  us  to  estimate  the  values 
of  the  quantities  in  which  we  are  interested. 

We  first  ask  a  question  in  pure  mathematics.  Suppose 
that  we  observe  the  values  of  n  differentiable  functions  of  tti 
variables,  what  are  the  values  of  the  variables  ?  There  are 
three  obvious  cases : 

A)  n  <  m.  The  problem  is  indeterminate  if  the  functions 
be  independent ;  we  can  find  no  unique  solution  when  the 
number  of  equations  is  less  than  the  number  of  unknowns. 

B)  n  =  m.  The  problem  is  determinate  (usually)  and 
depends  upon  our  analytical  skill  in  solving  the  equations. 

C)  n  >  ra.  If  the  observed  values  contained  no  error 
whatever,  some  of  the  equations  would  result  from  the  others, 
and  we  should  fall  back  on  a  previous  case.  But,  owing  to 
accidental  errors  of  observation,  the  system  is  incompatible 
and  the  question  is,  '  What  do  we  propose  to  do  about  it  ?  ' 
An  easy  plan  would  be  to  discard  some  of  the  equations  and 
solve  the  others,  but  we  have  no  sure  guide  as  to  which 
equations  might  better  be  discarded,  and  we  should  certainly 
lose  some  accuracy,  just  as  it  is  less  accurate  to  take  a  single 
measurement  of  a  quantity  than  to  take  the  average  of 
several  discordant  measurements. 
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In  order  to  solve  our  problem,  we  must  decide  on  the 
meaning  of  the  phrase  '  best  values  for  the  unknowns '.  This 
we  shall  do  as  we  proceed  with  our  analysis.  Suppose  that 
we  have  observed  the  values  of  n  functions,  not  necessarily 
distinct,  of  on  variables.    These  observational  equations  shall  be 

f^(n^,n.^,  ...  uj  ^x^,  (1) 

•         •         •         •         • 

The  x\  are  observed  values.  Let  us  assume  that  we  know 
the  weights  of  our  observations,  although  we  do  not  assume 
that  we  know  the  probable  errors.  We  assume  also  that  the 
number  of  equations  is  greater  than  the  number  of  the  un- 
knowns, and  that  the  equations  are  inconsistent,  so  that  we 
cannot  discard  some  and  solve  the  rest. 

Postulate  l]  Tlie  best  values  for  the  unknowns  are  those  which 
will  (jive  a  maximum  value  to  the  prohability  of 
obtaining  just  this  series  of  measurements. 

Assumption  l]  The  error  of  each  observation  follou'S  the 
Gaussian  law  with  a  proper  precision. 

Assumption  2]  We  can  make  first  approximations  to  the 
unknotvns  so  accurate  that  in  correcting  the  values  of 
thefi^  corrections  above  the  first  order  may  be  neglected. 

Let  X^  be  the  true  value  for  the  function  /^ ,  the  true  error 
shall  be  x^—X^  —  ^^  and  the  precision  k^.  We  may  follow  the 
reasoning  of  the  previous  chapter,  which  shows  that  all  values 
of  Uj,  tt2,  ...  u,f^  are  equally  likely.  Our  problem  then  is  to 
maximize  -{KHi^+k^H2'+ ...  +  Vfn') 

and  this  amounts,  in  turn,  to  minimizing 

ih'L'+K'i^  +  -+K'ir?-  (2) 

If  we  had  taken  as  one  of  our  assumptions  that  requiring 
this  expression  to  be  a  minimum,  we  might  have  abandoned 
the  assumption  that  the  measures  followed  the  law  of  Gauss. 
It  will  be  well,  in  our  present  work,  to  use  a  large  assortment 
of  symbols.     Let   us   assume  that  the   true   values   of  the 
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unknowns  are  U-^,  U^,  ...  f/^.  These  we  do  not  know,  and 
never  shall  know.  Let  us  assume  that  our  good  first  approxima- 
tions are  cd^  ,  cwg,  . . .  c»w ,  and  that  their  true  errors  are  e^ ,  e^ , . . .  e^. 
Then  we  have  the  true  equations 

/^  (©1  +  61,0)2 +  €2,  ...  (>>rn  +  €j  =  X^, 
and  by  Assumption  2]  these  may  be  written 

/^(o)i,a)2,  ...  (oJ  +  S  ^  €j  =  Xi=  Xi-ii, 
hence  the  expression  to  minimize  is 

We  shall  equate  to  0  each  of  the  partial  derivatives  of  this 
with  regard  to  e^,  eg,  ...  e^.  We  do  not  know  the  values  of 
the  precisions,  but  we  assumed  that  we  knew  the  weights 
which  are  proportional  to  their  squares.  Hence  we  have 
m  equations 

It  is  well  to  write  these  at  length,  using  a  symbolism  which 
is  classical  in  this  sort  of  work.     We  write,  by  definition, 
SviSi  =  [rs],     SviSiti  =  [rst]. 

i  i 

Then  we  have  m  equations 


V  j/  - ,  ,  r„  y  V 1 .  ,     ,  r„  y  i^ 
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These  equations  are  called  the  normal  equations.  The 
principal  difficulty  is  to  remember  how  to  write  them  down. 
An  easy  way  is  as  follows.  We  begin  with  n  incompatible 
equations,  called  residual  equations^ 

VCO.  OODc 


'^"^^-   ^2+...+  j^^^^m 


fm  =  ^l-/iK  •••<»»»)> 


¥2.  .  y. 


)a>i 


62+...+  . 
00)2  CO) 


m 


These  equations  would  become  compatible  if  we  replaced 
the  quantities  ajj,  a?2>  •••  ^n  ^Y  their  true  values  X^,  Xg,  ...  X„. 
Multiply  the  first  equation  through  by  p^  times  the  first 
coefficient,  the  second  by  p^  times  the  first  coefficient,  and  so 
on.  The  sum  will  be  the  first  normal  equation.  For  the  A;th 
normal  equation  we  use  the  kth.  coefficient  each  time. 

The  case  which  arises  most  often  in  practice  is  that  where 
the  given  functions  are  linear.  Here  we  do  not  bother  at  all 
about  the  first  approximations  coj,  cog,  ...  co^,  but  take  as 
corrections  the  unknowns  themselves.  It  makes  for  clearness, 
also,  to  use  a  variety  of  letters,  rather  than  to  use  double 
subscripts.     We  write  the  residual  equations 

a^^u^  +  b^u^-^  ...+miU^  =  x^, 
a^u^  +  h^u^  +  . . .  +  m2U„j  =  x^y 
...  (6) 

a,,u^-\-hnU^^  ,,.^m^u^  =  x^. 
The  normal  equations  then  are 

[paa'\  u^  +  [pa6]  U2  +  . . .  +  [parri]  u^  =  [pax], 
[pha]  Ui  +  [pbb]  U2  +  . . .  +  [pbm]  u^  =  [pbx], 


[pma]  u^  +  [^i6]  1^2  +  ...  +  [pmm]  u,^  =  [^^ma;]. 
I  [paa]  [pbb]  ...  [pmm]  |  =  A. 
[pax]  [pab]  . . .  [^am] 
[pbx]  [pbb]  ...  [p6m] 


(8) 


U^  = 


[^mic]  [pm6] ...  [pm^n] 


(9) 
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At  this  point  we  must  consider  a  troublesome  little  theoretical 
difficulty  which  most  writers  on  the  present  subject  calmly 
ignore,  namely,  the  possibility  that  the  denominator  might 
be  0.  Fortunately  this  cannot  happen.  Let  us  replace  the 
normal  equations  by  the  true  equations 

Vp^a^  U^  +  /P2^2  ^2  +  •  •  •  +  ^i?2^2  ^m  =    V^^2^2' 

These  equations  being  true  are  certainly  consistent,  and  the 
determinant  of  a  set  of  m  of  them  cannot  vanish  in  every 
case  unless  there  be  not  enough  independent  equations  to 
determine  the  IPs,  But  A  is  the  sum  of  the  squares  of  these 
m  row  determinants,  hence  it  cannot  vanish. 

As  an  example  of  how  to  work  these  processes  in  practice, 
we  take  a  problem  in  levelling  : 

A  above  0  =  673-08  ft.  B  above  A  =  2-60  ft.  D  above 
B  =  170-28  ft.  B  above  0  =  675-27  ft.  C  above  B  =  167-33 
ft.  J)  above  G=  3-80  ft.  D  above  E=  425-0  ft.  ^  above 
0  -  319-91  ft.     ^ above  0  =  319-75  ft. 

We  assume  that  all  observations  are  equally  trustworthy, 
and  take  the  weights  equal  to  1 .     We  have 


Residual  equatio 

ns. 

Normal  equations. 

u. 

=  573-08 

2Ui 

-    ^2                                 =  570-48 

u^  +  u^ 

=       2-60 

-u. 

+  4-^2-    'M's-   'W'4             =240-26 

-u^         +^4 

=  170-28 

-    ^2+2^3-    u^             =  163-53 

^2 

=  575-27 

—    u^—    u^  +  3u^—    Ug  =  599-08 

-Ug  +  'W'a 

=  167-33 

-    ^4  +  31^5=  214-66 

-u^-hu^^ 

=       3-80 

u^- 

-'^5 

=  425-0 

^5 

=  319-91 

« 

U. 

=  319-75 

Eliminate  u^  from  the  last  two, 

-3^2-3^3  +  8^4  =  2011-90. 
Eliminate  u^  from  the  first  two, 

7u^  —  2u^'-2u^  =  1051. 
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Double  the  third, 

-2162  +  4^3-21(4=  327-06. 

Eliminate  u^  twice, 

25^2—11^3  =  6215-90, 

9lt2-6l/'3  =  723-94. 
Dividing  by  3,         3u,^—2u.^  =  241-31, 
Multiply  by  6  and  subtract  from  the  equation  two  places 
above,  71^3 +  U3  =  4768-04. 

Double  and  add  to  the  preceding, 

171^2  =  9777-39, 

1^2  =  575-14. 
Hence,  finally, 
Ui  =  572-81,    1^2  =575-14,    U3  =  742-05,    u^  =  745-43, 

U5  =  320-03. 

In  this  particular  case  the  coefficients  in  the  normal 
equations  are  unusually  simple,  and  for  that  reason  these 
equations  are  easily  solved.  Unfortunately,  things  do  not 
always  turn  out  so  pleasantly.  We  must  exhibit  the  standard 
method  to  be  followed  in  the  usual  difficult  cases. 

Two  remarks  are  necessary  at  the  outse.t.  The  first  is  that 
it  is  necessary  to  provide  some  check  on  our  work  as  we 
proceed.  The  second  that  the  determinant  of  the  coefficients 
in  the  normal  equations  is  symmetric.  In  consequence,  if  we 
write  all  that  lies  above  the  principal  diagonal,  we  know  the 
rest.  The  method  we  shall  pursue  has  two  characteristics. 
There  is  a  check  which  is  carried  along  automatically,  and 
each  time  we  get  a  new  set  of  equations  with  one  less  variable, 
the  determinant  of  the  coefficients  is  symmetric. 

We  first  re- write  the  residual  equations,  putting  the  con- 


Problem. 

The  following  observations  for  level  were  made  : 

A  above  0  =  115-52.  B  above  A  =  60-12.  B  above  0  =  177-04.  C  above 
yl  =  234.12.  C  above  J?  =  171-0.  E  above  C  =  682-25.  i?  above  2)  =  211-01. 
D  above  B  =  596-12.    D  above  C  =  427-18. 

Find  the  various  diflferences  in  level. 
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stant  on  the  left  and  calling  it  r^.  Then  we  suppress  the 
equality  sign  and  put  in  a  column  of  numbers  Sj,  Sg,  ...  s^,  each 
of  which  is  the  sum  of  the  coefficients  and  constants  in  its 
row ;  this  column  we  call  check. 

a^u^  +  62^2  +  '"^2^w  ^2  ^2 ' 

From  these  we  get  the  normal  equations  which  we  write 
in  a  similar  manner,  omitting  whatever  is  below  the  principal 
diagonal, 
[paa]  Uj  +  [pab]  u^  +  [2)(ic]  u^  +  [pam]  u^  \^par'\  [pas]  I 

[phh'\u2  +  [phc]u^  +  [phm'\'Vb^  [pbr\    \^phs\         II 

[2>cc]  tCg  +  [ jjcm]  u^^  [per]    [pes]        III 

•         •         • 

[pmm]  u^  [pm?']  [pms].       M 

The  last  column  is  the  check,  and  the  sum  of  coefficients 
and  constants  on  its  row.  To  check  a  row  not  written  in 
full,  start  at  the  top  row  and  add  downwards  in  any  column 
to  the  diagonal,  then  to  the  right.  The  sum  should  be  the 
check  after  the  last  term  added.  We  next  divide  the  first 
equation  by  [paa], 

^      [2^^^*]    ^      [part]    ^      ' "  [paa J     ^  [paa]  [paa]  * 

We  now  manipulate  equations  I  and  I'  as  follows.  We 
multiply  I'  by  the  coefficient  of  Ug  in  I  and  subtract  from  II, 
we  multiply  I'  by  the  coefficient  of  u^  in  I  and  subtract  from 
III,  and  so  on.  We  get  finally  a  new  set  of  equations  with 
the  following  properties : 

A)  Uj  has  been  eliminated,  so  that  there  are  m  —  1  equations 
in  as  many  unknowns. 

B)  The  determinant  of  the  coefficients  is  symmetric. 

C)  The  term  in  the  last  column  checks  the  others. 

These  equations  are  identical  in  form  with  those  numbered 
I,  II,  ...  M.  We  start  again  and  eliminate  another  variable 
in  the  same  way. 
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As  an  example,  let  us  try  the  equations  we  had  before 
2Ui 


-^2                           -570-48-569-48 

I 

4U2-U3-'M4          -240-26 -239-26 

II 

2U3-U4          -163-53-163-53 

III 

3^4-165-599-08-599-08 

IV 

3U6-214-66-212-66 

V 

-i-Ug                        -285-24-285-74 

I' 

in^-n.^-u^         -525-50-624-0 

I 

2u^-u^          -163-63-163-53 

II 

3U4-U5  —  599O8- 599-08 

III 

3^5-214-66-212-66 

IV 

^2-^1/3-1^4- 150-14 -149-71 

I' 

i_2t^2_9t(^          -313-67-313-24 

I 

1^9  t(,^_t/,g_  749-22 -748-79 

n 

3^5-214-66-212-66 

III 

u^-^u^         -182-97-182-72 

I' 

JU4-U5- 984-47 -983-72 

I 

3^6-214-66-212-66 

II 

t^^  —  ^Un- 562-55  — 562-13 

I' 

Y-U5  =  777-21. 


We  thus  get 

Uj  =  572-81;    162=575-14;    U3  =  742-06  ;    1(4=  745-43; 

U5  =  320-03. 

We  must  further  caution  the  reader  not  to  sit  in  the  seat  of 
the  scornful,  saying  that  this  method  turns  out  ever  so  much 
more  cumbersome  than  the  other.  So  it  does  in  the  present 
case,  and  so  it  often  will  when  the  coefficients  in  the  normal 
equations  are  particularly  simple.  It  is  wiser  in  such  cases 
to  solve  by  the  first  method  that  comes  to  hand.  But  when 
the  coefficients  are  complicated,  the  computer  who  waits  for 
inspiration  to  find  the  best  way  to  handle  his  equations,  will 
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probably  be  ill  inspired.  Should  the  reader  be  anxious  to 
practice  this  standard  proceeding  on  more  complicated 
equations,  he  will  have  no  difficulty  in  finding  equations  that 
will  give  him  all  the  practice  he  desires. 

^  We  now  turn  to  a  question  of  great  theoretical  impor- 
tance, the  weight  to  be  attached  to  the  solutions  of  the  normal 
equations.  This  calculation  is  so  difficult  that  almost  all 
text-books  omit  it.  The  following  development  is  the  easiest 
that  we  have  seen.  Let  us  begin  by  replacing  our  incorrect 
residual  equations  by  the  true  equations 

»!  ^1  +  61^/2+  ...  +m,  Ujn  =  X^, 
(10) 

On^\  +  b„U^+  ...-{-7)1^,11^  =  Xn. 

We  have  also  a  set  of  true  equations,  analogous  to  the 
normal  equations 

[jMo]  U^  +  [pab]  ^2  +  •  •  •  +  [parri]  U^  =  [^aZ], 

[^6a]  U^  +  lpbb]  ^2+  ...  +  [i;6m]  U,^  =  U^bX],        (11) 

[^77ia]  U-^  +  [pm6]  C/g  +  •  •  •  +  [p'^^'^]  U^m  =  [i^^^^]- 

Corresponding  to  the  true  errors  |^  of  the  observed  quantities, 
we  shall  have  true  errors  of  the  quantities  to  be  computed, 
namely,  ^.^^._u..  (12) 

It  must  be  noted  that  there  are  n  |'s  but  only  m  ?;'s.  From 
(11)  and  (9) 

[paa]  77i  +  [2Mb]  r)^+...  +  [^mm]  rj,^  =  [pa^], 

[pba]  rj,  +  [pbb]  7/2  +  ...  +  [pbm]  rj^  =  [pb^], 

(13) 

[pma]  ?7i  +  [2Jmb]  t/^  +  . . .  +  [pmon]  ?;„,  =  [p7)ii]  ; 

the  solutions  are 
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V2 


^  [^^^]  ^]   ^  [^^^^^  CT    -"•'•  +  ^^^^J  ^1^         ^ 


(15) 


We  have  further,  by  the  elementary  theory  of  determinants, 


A 

J)A 


0  = 


A 


(16) 

a  9^  /c 
(17] 


1  _  ^  .     (18) 


^A 


A 
^A 


^A 


,  ^  f^^^^  n^] "-  [^^^^^  nm '-"''  ^^^^^^^  ^t]^  ,  ^ , 

(19)1 

We  have  assumed  throughout  that  we  knew  the  weights  of 
our  observations,  but  not  the  corresponding  precisions,  whoso  j 
squares  are  proportional  to  them, 

Here  p  is  a  multiplier  to  be  determined  later.  The  quantities 
?;^-  are  linear  homogeneous  combinations  of  the  true  errors  ^^, 
hence,  by  VII  12]  they  follow  the  Gaussian  exponential  law. 
We  must  find  their  precisions,  K^,  This  we  get  by  that  same^ 
theorem,  namely, 


fpf[ 


^A  <^A 


+  ...  +  m 


^A 


2K^^ 


+  ... 
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If  we  multiply  equation  (16)  by  r-j- =j ,  and  each  equation 

(17)  by  —7 r--,  and  add,  we  find 

^     ^    -^  ^  [pale] 

1      _  ^     ^  I  paa]  ^  ^  log  A  ^ 


2  a:/  -  A^  ^  ^  [pail] ' 

Another  method  of  finding  this  would  have  been  to  find 
the  mean  value  of  ij^'^.  If  we  seek  the  mean  value  of  rj^rj^  we 
shall  find,  using  equations  (16)-(19), 

Mean  value  ViV2  =  P  yr~^i;} '  (^2) 

Lastly,  we  find  similarly 
Mean  value 

,  r       ^loixA       ,    c)loorA  ^logA"!      ,„„, 

Why  trouble  about  these  last  mean  values  1  They  are 
needful  to  determine  p.  In  the  residual  equations  (6)  let  us 
put  the  solutions  of  the  normal  equation  and  add  corrections 
5^  on  the  right  so  that 

aiUi  +  biU2  +  ...+miU^  =  Xi  +  Si. 
We  have  also 

Subtracting     (/.  77^  +  6.77,+  ...  +m^7;„,-^^  =  S^. 

Let  us  square  each  of  these  equations,  multiply  by  tlie 
corresponding  p^,  and  take  the  mean  value  of  the  sums,  in 
view  of  (16)-(23). 

The  expression  [j)<55],  which  is  an  observed  quantity,  is 
equal  to  its  own  mean  value.  The  equivalent  expression  on 
the  left  contains  only  terms  of  the  types  (21),  (22),  and  (23)  : 

I  ss  il 

[pSS]='2piSi'^(n-m)p.  (24) 

i  =  1 
2686  ]y£ 
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SummiDg  up,  we  get  the  final  expression 

1      _i)logA    [p88]  1      _  ^  log  A    [irSS] 

2~K}  ~  ^[paa]  {tb-m)  '    2  A^g'"^  ""  ^  lpbb\  (n-vi) 

Probable  error  ot     u,  =  0-6745  (^-r^^  r^ \  )   * 


(25) 


(26) 


It  is  clear  that  a  similar  type  of  calculation  could  be  applied 
in  the  more  general  case  (5). 

§  2.     Conditioned  Observations. 

It  sometimes  happens  that  the  quantities  which  we  seek 
are  not  independent,  but  are  connected  by  certain  identical 
relations.  The  problem  is  to  find  the  best  values  for  them 
subject  to  these  restrictions.  To  begin  with  the  most  general 
case,  suppose  that  our  quantities  u^,  u^,  ...  ii^  are  connected 
by  the  relation 

</)i(Ui,  ^2,  ...  uj 

=  <f>2('W'i,  U2,  ...  uj  =  ...  <t>i  (Uj,  Uo,  ...  uj  =  0.         (27) 

We  assume  that  Assumption  2]  may  be  extended  to  these 
functions  also,  so  that  we  may  write  these  equations 

We  have  now  a  problem  in  relative  minima,  namely,  to 
minimize 


I 


2  Pi  [fi  K  -''CoJ-^i  +  J!  ^  ^j  j 


-  2  '•fc[*iK-"<»m)  +  ^j;^^i] 


(29) 


The  ni  partial  derivatives  of  this,  and  the  I  equations  (28) 
will  be  sufficient  to  determine  the  corrections  €•  and  the 
multipliers  rj^. 

A  simple  case  arising  frequently  in  practice  is  that  where 
we  observe  directly  the  quantities  which  we  desire  to  calcu- 
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late,  and  where  there  is  but  one  identical  relation.     We  have 
here  a  simpler  form  for  (29),  namely 

i  =  m  -\  J,  -t 

2  Pi^i'-  '•  [*  (»i  •  •  •  <»«)  +  ^  j^.  V ]•  (30) 


.«,, 


^r   1  /  c)<f)\2 


lii^  (31) 


Example  l]  The  observed  values  of  the  angles  of  a  triangle 
are  6^^,  B^,  6^;  ivhat  are  the  best  values ? 

(j)  =  0^  +  02+ Oq  —  tt, 

Example  2]  The  observed  sides  of  a  right  triangle  are  a,  b,  c  ; 
what  are  the  best  values  ? 

(f>  =  a^•^b^-c^ 


u 


'~''V~  2(a'  +  b^  +  c')r     ''2-^Ll       2{a^  +  b''  +  c')] 

Ti  a^  +  h^j-c^l 


§  3.    Curve  Fitting. 

An  interesting  application  of  the  method  of  least  squares  is 
to  the  problem  of  finding  a  curve  of  given  type  which  will 
best  represent  a  given  function.  The  problem,  stated  in  this 
bald  fashion,  is  evidently  indeterminate,  until  we  define  the 
term   best  represent   by   means   of   some   postulates.      Still, 


Problem. 

The  sides  of  a  triangle  of  homogeneous  material  are  measured,  and  the 
area  is  determined  by  weight.     Find  the  best  values  for  the  sides. 

M  2 
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there  are  several  standard  artifices  which  can  be  fairly  well 
justified. 

Suppose,  first,  that  we  wish  to  find  a  power  series  develop- 
ment for  a  given  function  </>  (x).  The  reader  might  be 
inclined  to  answer  immediately  that  this  is  a  yeiy  old 
problem  indeed ;  we  have  merely  to  take  Maclaurin's  series. 
The  Maclaurin  series  is,  indeed,  absolutely  correct  if  all  the 
terms  he  included.  This  can  never  be  done  in  practice. 
When  we  take  merely  a  finite  number  of  terms,  the  Maclaurin 
series  has  merely  the  property  that  it  is  the  best  possible 
representation  of  the  function  very  near  the  origin,  i.e.  that 
it  takes  the  same  value  there  as  does  the  function,  and  that  if 
71+1  terms  be  taken  the  first  n  derivatives  take  the  same 
value  at  the  origin,  whether  we  differentiate  the  function  or 
the  series.  This  does  not  by  any  means  show  that  we  could 
not  perhaps  find  another  polynomial  of  the  same  degree 
which  gave  a  better  average  representation  of  the  function 
throughout  a  certain  interval.  Let  us  write  the  general 
polynomial  of  degree  n 

aQ-^a-^^x  +  a2X^+ ...  +a„aj^ 

How  shall  we  determine  the  coefficients  so  that  this  shall 
best  represent  a  given  function  throughout  an  interval  ? 
The  most  obvious  way  would  bo  to  divide  the  interval  into 
n  equal  parts  and  determine  the  a's  so  that  at  each  of  the 
91+1  bounding  points  the  function  and  the  polynomial  had 
the  same  values.     This  would  be  the  method  of  interpolation. 

A  little  reflection  will  now  lead  us  to  the  idea  that  it  is 
unwise  to  limit  ourselves  to  n-\-l  points.  Why  not  take 
a  good  many  more  points,  so  that  the  number  of  equations 
will  be  larger  than  that  of  unknowns,  and  solve  by  the 
methods  developed  in  the  present  chapter?  Let  us  take 
the  points  Xq^  x^,  ...  x^^  equally  spaced  by  the  interval  Ax. 

The  residual  equations  are 

(^0  +  t<i'3:'o  +  ag  V  +  •  •  •  +  "n^o""  =  "P  W» 
Uq  +  ajaji  +  a^x^^  +  . . .  +  a^,x^''  =  cf)  (x^), 

•         •         • 

Uq  +  a^x^  +  a^x^'  +  . . .  +  a^x^,''  =  (/>  (a;,,). 
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The  normal  equations  are 

ao5'l+    a^^cci    +...+   ^n^J^i''    =S<t>iXi), 


a 


,^Xi+    a.^x.^    +.,.+a,,Sxi^-''  =  :Sxiit>(Xi), 


(32) 


a.Sxf^  +  a^^Xi''^^  +  ...  +    «„5'a^/^  =  5'a:f(/)  (Xi). 


The  next  step  is  obvious.  Why  not  keep  Xq  and  Xn  fixed, 
multiply  each  of  these  equations  by  A  a;  and  take  the  limit 
as  A^— ^0? 


r>X, 


Ur 


pX„ 


dx-^      «! 


xdx    +  . . .  +   a„       x''^dx   = 


(f>  (x)  dx, 


a, 


fX^ 


xdx^-   «!  I     x'^dx   +...+a„j     x^'^'^dx  -       x^(x)dx, 


'^n 


(33) 


a^^      x^hlx  +  aA     x^-^^dx  + ...+  a,A     x'^'^dx=        x^(p{x)dx. 
Note  that  these  equations  determine  a^^,  «j,  ...  a^^  so  tliat 

R 

[«,^  +  a, rc  +  . . .  +  rf ^j^;"  —  0  (;^)]-  d.t;  is  a  minimum. 
I 

TT  77 

Example]  Fiiul  iJte  value  for  sinx  inilie  ititei^ud  —  «  ^^  o 
in  ilie  forni  of  a 'polyno7)iial 

itQ-\-a^x  +  a^x-  +  a.^x^'  +  a^x*. 

To  begin  with,  it  is  well  to  have  tliis  polynomial  vanis«h 
with  X.  Moreover,  we  should  like  it,  like  sin  03,  to  be  an  odd 
function.     Hence  we  write 

a.x  +  f(oX^  =  sin.T. 


a-^x^dx  + 
a^x*dx  + 


a..x*dx  = 


0 


a^x^dx  = 


X  sin  X  dx, 
x^  sin  X  dx, 
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(fy^(i)>-[(i;-} 

a.j  =  —0-1450, 
a^  =0-9888. 
In  the  usual  Maclaurin  series 

a,  =  — 0-1G67     a,  =  1. 

^  1 

For  X  =  -,  the  present  series  gives 

sin|"=  0-991. 

Two  terms  of  Maclaurin's  series  will  give 

sin  J=  0-924. 

There  frequently  arise  cases  where  we  do  not  wish  to 
express  a  given  function  in  power  series,  but  in  some  other 
shape.  Suppose,  for  instance,  that  we  wanted  a  trigonometric 
series  for  the  interval  from  —  tt  to  +7r  of  the  form 

a^  +  ttj  cos  ic  +  ttg  cos  2  03  4- «3  cos  3  aj  4- . . . 
+  ^j  sin  re -I- ^2  s^°  2a;  +  Z»3sin  3ic+  .... 
We  replace  our  equations  (33)  by 


nn 


-n     k- 


J  —TT 


nir 

^  —IT 
—  IT 


\^ayCo^kx-\-hj^iAnkx'\dx  =         <p(x)dx. 

k  J  —TT 

■♦TT 

cos  Ix  [Sap^  cos  kx  +  6^  sin  kx]^  dx  =         cos  Ixcf)  (x)  dx, 

k  J  —IT 

■>1T 

sin  r)ix  [5'«^  cos  kx  +  hj^  sin  kx^  dx  =         sin  Tnxcp  (x)  dx. 


J  — TT 
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Now 


sin  mx  sin  kx  dx  =  0.         in  ^  k, 


^  —It 


pir 


^mkxQo^lxdx  =  0. 


cos  kx  cos  Ix  dx  =  0.         I  -^^  k, 


sin^  kx  dx  =  tt. 


cos^  A:a;  dx  —  tt. 


k^O, 


k^O, 


^0=2^ 


a,.  = 


TT 


6.= 


TT 


TT 

COS  kx  0  (ic)  cZx', 
sin  kx(f)  (x)  dx. 


All  this,  however,  is  nothing  in  the  world  but  the  usual 
determination  for  the  coefficients  in  a  Fourier  series,  so  that 
wo  reach  tlio  interesting  result  that  whereas  a  finite  number 
of  terms  of  a  Maclaurin  series  does  not  give  the  best  polynomial 
development  of  a  given  function  over  any  interval,  any 
number  of  terms  of  the  Fourier  series  will  give,  for  the 
interval  —  tt  to  tt,  the  best  development  involving  just  those 
terms. 

^  Let  us  take  a  still  more  general  case,  and  try  to  represent 
0  by  a  function  of  known  type  and  undetermined  coefficients 

f{x,  a^,  «!,  ...  aj. 

Following  our  precedent  in  the  case  of  a  polynomial  where 
we  had  an  infinite  number  of  points,  we  should  like  to  minimize 


n^N 


{/-•Pfdx. 


^Xr 


Problem. 

Calculate  cos  x  in  the  same  way. 
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To  do  this,  we  equate  to  0  the  partial  derivatives  of  this 
integral  with  regard  to  the  as,  and  throw  in  the  supple- 
montar}' condition  that  the  areas  under  the  two  curves  should 
be  the  same. 


We  get 


n^N 


f(x)dx—        (f){x)(lx, 


'Xn 


n^'x 


'¥ 


^x,    '^^^i 


f{x)dx  = 


--~(i)(x)  dx. 


(34) 


The  trouble  here  is  the  very  prosaic  one  that,  except  when 
/  is  a  polynomial,  the  eliminations  are  altogether  unmanage- 
able. We  must  seek  another  method.*  If  we  look  closely 
at  the  equations  (33)  and  inquire  into  their  geometrical 
meaning,  ^ve  find  it  to  be  this.  The  first  ^i+l  moments  of 
the  areas  under  the  curve  0  and  under  the  polynomial,  about 
the  y  axis,  are  equal  to  one  another.  This  suggests  the  idea 
that,  in  the  general  case,  we  should  find  the  coefficients  by 
generalizing  the  process  of  equating  these  moments.  We 
thus  replace  the  equations  (34)  by 


"••'*' A' 


XK 


r^iv 


f{x)  dx  =  j     (p  (x)  dx, 

Xq 


(35) 


:c^'  f  (x)  dx  =        x'^  0  (x)  dx. 


^Xr 


>-'  .T, 


These  equations  will  usually  be  easier  to  liandle  in  practice. 
It  is  to  be  noted  also  that  we  pass  from  (34)  to  (35)  by 
expanding /in  Maclaurin's  series. 

^  The  problem  in  practice  frequently  assumes  a  different 
form,  in  that  the  function  0  is  not  given,  merely  the  n+l 
pairs  of  points  (x^y^,  (^i  2/i)  •••  i^nVn)'  -^^^  "^  imagine  that 
these  are  joined  by  a  broken  line  which  must  replace  the 
curve  0.  We  are  faced  with  the  laborious  mathematical 
problem  of  calculating  the  moments  about  the  y  axis  of 
a  series  of  trapezoids  standing  on  the  x  axis.     For  simplicity 

*  Cf.  Karl  Pearson,  '  On  tlic  Systematic  Fitting  of  Curves  ',  Biometrika,  vol.  i, 
1901-2,  and  vol.  ii,  1902-3. 
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we  shall  assume  that  the  intervals  on  the  x  axis  are  all  equal, 
so  that  rr      _r,  —  /) 

For  the  upper  (or  lower)  side  of  the  {k+  l)th  trapezoid 

The  mth  moment  of  this  trapezoid  is 
f '■  ^  '  [Vk  +  -^^4±^  (X  -  xt)j  x-'dx 

bV^"         Vm+l       m+2/     ^^^^  \m  +  2      m  +  l/J 

6(m+l)(m  +  2) '^'^'-^  *       '  '"  \  ^    k       j 

+  2//.+i[«^r^'-K  +  ^r'''  +  M^  +  2)(aj;,  +  6)'"+^]}. 

We  get  the  total  coefficient  of  yj^  by  adding  the  first  part  of 
this  to  the  last  of  the  preceding  term,  namely 

?_ l(xj.  +  hf'^''  +  (xj, -  6)^+2 _  2 ^,.  ;»+2] 

6(m+l)(m  +  2)L^  '^       /        -rv  a       /  /.       j 

,  fa';.''"      m(m  — 1)     ^  o  70 
L  2!  4!  '-^ 

m(m-l)(m-2)(m-3)^  ,„.4  ^,     1  ^ 
"^  6!  '^'  "j 

There  remain  the  end  ys,  each  of  which  appears  in  only 
one  term.     The  total  coefficient  of  2/0  is 

1 ^ TT  [(-^0  +  fer+'^-aJo""""-^^  (m  +  2)  a;o'«+"^] 

6(m+l)(m+2)'-^  ^       ^  ^  /    0       j 

.   m(m-l)(m-2)(m-3)  ^.^,^         1 
-(m+l;V4-2)[^^o-^>)---^o--  +  (m  +  2).r--^>]- 
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The  total  coefficient  of  y^  is 


[or  '' 
2! 


'"       m  (m  —  1 ) 


4! 


n 


m(m-l)(m-2)(m-3)^  „,_,.,         -| 


h 


{ni 


+  l)(ra  +  2)  t^*'"  +  ^r^'-^rT^'-i^  +  2)  ^n™^'^]. 


Lastly,  if  we  put  k=n 


k  =  0 


the  total  mth  moment  is 

-  (m+lHm+2)  {2/o[K-^r^'-^o"^'  +  (^^  +  2)  V^+16] 

+  2/n[(aJ„  +  6y'*+2_aj^m+2_(^^2)a:^'»^15]}.  (36) 

This  unlovely  formula   probably    represents   the    simplest 
form  attainable.     It  is  visibly  simpler  in  the  case  where 

2/o  =  2/n  =  0. 
The  whole  subject  of  curve  fitting  leads  us  naturally  to 
a  topic  which  has  attained  a  large  development,  through  the 
efforts  of  the  large  number  of  writers  on  mathematical 
statistics.  In  England  Karl  Pearson  has  founded  a  whole 
school,  and  on  the  Continent  the  Scandinavians  have  shown 
themselves  particularly  skilful  in  developing  the  theory.  A 
thorough  discussion  of  all  of  these  new  methods  will  bo  found 
in  Arne  Fisher's  Matheynatical  Theory  of  Probahilities'^ 

*  2nd  edition,  New  York,  1922,  Paris  II  and  III. 


CHAPTER  X 

THE  STATISTICAL  THEORY  OF  GASES 

§  1.    General  Properties  of  Perfect  Gases 

In  Chapter  V,  which  dealt  with  geometrical  probabili  y, 
we  excused  ourselves  for  lingering  over  such  an  elegant  trifle 
on  the  gi'ound  of  the  connexion  with  the  kinetic  theory  of 
gas,  and  the  methods  of  statistical  mechanics.  It  is  now 
time  to  give  a  very  summary  introduction  to  these  extended 
topics.*  We  shall  content  ourselves  with  giving  the  method 
for  deriving  Maxwell's  expression  for  normal  distribution, 
and  critical  comments  thereon ;  it  is  not  our  business  to 
deduce  physical  properties. 

A  gas,  for  our  present  purposes,  is  conceived  as  a  very 
large  agglomeration  of  very  small  molecules  in  rapid  motion. 
We  consider  the  case  of  a  gas  confined  in  some  vessel,  and,  as 
a  first  approximation,  make  the  following  assumptions : 

I)  The  vessel  is  of  finite  volume,  with  perfectly  elastic 
walls  which  are  surfaces  given  by  differentiable  equations. 

II)  All  gas  molecules  are  smooth,  incompressible,  perfectly 
elastic  spheres  of  uniform  diameter  cr  and  mass,  acting  under 
the  influence  of  no  forces. 

It  is  evident  that,  under  Newton's  second  law,  each  mole- 
cule w^ill  be  in  a  state  of  rectilinear  motion  at  uniform 
velocity,  or  at  rest,  except  when  its  course  is  altered  by 
a  collision  with  a  boundary  wall  or  with  another  molecule. 
We  are  not  concerned  with  the  actual  shell  of  the  vessel,  but 

*  In  the  present  chapter  I  have  leaned  very  heavily  on  Castelnuovo,  loc. 
cit.,  ch.  xiii.  For  a  more  detailed  study  see  Jeans,  Dynamical  Theory  of  Oases, 
3rd  edition,  Cambridge,  1921,  chs.  ii-iv. 


172  THE   STATISTICAL   THEORY   OF   GASES 

with  a  surface  parallel  thereto,  a  radius  distance  inside,  for 
this  is  the  effective  limit  for  the  centre  of  a  molecule ;  when 
we  speak  of  the  boundary,  we  mean  this  latter  surface.  The 
laws  of  collision  will  then  be  the  following : 

A)  When  two  spheres  collide,  the  motion  of  their  centre  of 
gravity  is  unaltered,  and  the  vector  velocity  of  one  centre 
with  regard  to  the  other  is  replaced  by  its  reflection  in  the 
common  tangent  plane  at  the  point  of  impact. 

B)  If  the  centre  of  a  sphere  meet  a  boundary,  the  vector 
velocity  after  impact  is  the  reflection  of  the  vector  velocity 
before  impact,  in  the  tangent  plane  to  the  boundary  at  that 
point. 

It  is  possible  that  some  molecules  will  strike  exactly  into 
cracks  between  two  parts  of  the  surface,  which  amounts  to 
a  molecule  centre  striking  a  double  curve  of  the  boundary, 
but  this  case  will  occur  an  infinitesimal  number  of  times,  and 
may  be  overlooked. 

Since  the  rotations  of  the  spheres  are  of  no  importance,  the 
essential  point  to  be  borne  in  mind  is  that  the  total  vis  viva 
of  the  system  will  be  constant.  The  phenomena  which  we  call 
'  temperature '  and  *  pressure'  depend  upon  molecular  velocities; 
it  is  with  them  that  we  shall  be  specially  occupied. 

Let  the  total  number  of  molecules  be  n.  The  cooi'dinates 
of  the  centre  of  the  ^th  molecule  shall  be  x^y^z^,  the  com- 
ponents of  its  velocity  UiV^w^,  Since  the  motion  is  un- 
accelerated,  a  knowledge  at  any  instant  of  the  Qn  quantities 

^1 2/1^1  •••  ^nV n^n'^h'^ i^^^i  •••  '^n'^n'^^'w  ^iH  give  a  Complete  account 
of  the.  state  of  the  gas.  Moreover,  this  knowledge  will, 
theoretically,  serve  to  predict  the  exact  state  at  any  future 
instant ;  in  other  words,  the  w^hole  history  is  determined  by 
the  initial  conditions  of  situation  and  velocity. 

§  2.    Representation  in  Hyperspace. 

There  is  a  great  saving  in  words,  in  dealing  with  the  gas 
problem,  if  we  use  the  language,  or  jargon,  of  the  geometry 
of  many  dimensions.  The  reader  must  i^ot  allow  himself  to 
be   unduly    alarmed   by   this    proceeding.     If  an   object   be 
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determined  by  N'  independent  variables,  we  say  that  each 
of  its  determinations  corresponds  to  a  point  in  a  space  of  N 
dimensions.  When  the  variables  are  connected  by  one  or 
more  equations,  we  say  that  we  have  a  variety  in  the  original 
space,  the  number  of  dimensions  of  the  variety  being  that  of 
the  space,  less  the  number  of  independent  equations.  A 
variety  of  iV—  1  dimensions  is  called  a  hypersurface^  if  the 
equation  be  linear  we  call  it  a  liyperplane.  When  we  speak 
of  the  distance  of  two  points,  we  mean  the  expression  obtained 
by  analogy  from  the  expression  for  the  distance  of  two  points 
in  three  dimensional  space  in  terms  of  their  rectangular 
cai-tesian  coordinates.  The  general  laws  for  combining  dis- 
tances are  the  same  no  matter  how  many  the  dimensions. 

We  start  with  a  space  of  67i  dimensions  where  a  point  has 
the  coordinates  x^y^z^...x^y^z^u^v^iv^  ...u^^v^^v^.  Such  a 
point  will  represent  the  state  of  a  gas  of  n  molecules.  As 
the  gas  changes  with  time,  so  does  the  point  move  in  the 
given  space.  What  is  the  nature  of  its  path  ?  Since  each 
molecule  is  moving  at  a  uniform  velocity  along  a  straight 
path  we  have 


Xi  =  Xi  +  tUi 

Hi  =  Ui, 

yi  =  yi+i^i 

^i  =  ^t> 

Zi^Zi  +  tw^ 

IV  i=  w^. 

(1) 

The  representing  point  is  moving  along,  at  uniform  velocity, 
on  a  path  parallel  to  the  flat  variety  of  3  ti  dimensions 
obtained  by  giving  the  last  three  coordinates  fixed  values. 
There  will  be  a  sharp  break  in  this  path  corresponding  to 
each  collision  in  the  gas.  To  understand  these,  we  must  first 
note  that  in  the  space  of  Gn  dimensions  there  are  certain 
limiting  hypersurfaces  which  the  representing  point  cannot 
pass.  The  centre  of  no  molecule  can  pass  the  boundary,  and 
the  distance  between  two  centres  can  never  be  less  than  cr, 
hence 

fi^iViH)  ^00  (XiyiZi)  ^  0  ...  ^  =  1,  2,  ...  n, 

{xj-Xj,f  +  (yj-yj,y^{zj-Zj,r-cT'  ^  0  j,  ^  =  1, ...  n.     (2) 
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Moreover,  the  vis  viva  of  the  system  has  the  constant  value 
E^  hence 

2  W  + if +  0-^^=0.  (3) 

t  =  1 

The  representing  point  in  Qn  dimensions  must  thus  remain 
on  the  hypersurface  (3)  moving  along  a  straight  line.  At  the 
moment  of  a  collision  in  the  gas  it  takes  a  sharp  jump  along 
one  of  the  hypersurfaces  (2). 

We  get  a  clearer  idea  by  using  a  slightly  simpler  repre- 
sentation, namely,  taking  the  space  of  ^n  dimensions,  where 
a  point  has  the  coordinates  x^y^z^,..  x^y^^z^.  The  3  n  quantities 
UjVjWi  ...u  v^iu^  are  the  components  of  the  velocity  of  this 
point,  the  path  will  be  rectilinear,  and  the  velocity  uniform 
until  the  moment  when  there  is  a  collision  in  the  gas.  If 
a  molecule  collide  with  the  boundary  /  =  0  we  have  the 
following  relation  between  the  components  of  velocity  before 
and  after : 

There  will  be  similar  equations  for  v/  and  w/.  On  the 
other  hand,  if  the  molecules  ^jVj^j  and  Xj^y^Zj^  collide,  we  have 

Xj^^Xj-{-l(T      yk  =  yj  +  '^(^      ^/c  =  ^y  + -J^o", 

'-Uj/=-2l[l(uj-Uf,)-{-m{vj-Vj,)+n{wj-Wj^)]  +  Uj-Uj., 
u/  ^-l[l  (uj - Uj,)  +  m {vj - Vj,)  +  n  (wj - ivj^)]  +  Uj , 
uj;  =  I  [I  (uj  -uj,)+m  {vj  -  Vj,)  +  71  {ivj  -  tuj^)]  +  Uj^ .         (5) 

These  equations  have  a  simple  meaning.  Let  us  suppose 
that  the  laws  of  elastic  bodies  are  the  same  in  the  space  of  3  71 
dimensions  as  they  are  in  our  space,  i.e.  when  an  elastic 
point  encounters  a  hypersurface,  it  bounds  off  with  a  vector 
velocity  which  is  the  reflection  in  the  tangent  hyperplane  of 


^i 


u 
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the  previous  vector  velocity.     The  interpretation  of  equations 
(4)  and  (5)  is  then  as  follows  :  * 

Theorem  l]  If  n  gas  'molecules  he  rejyresented  by  a  single 
2)oint  in  a  space  of  n  divunsions,  the  alterations  of 
the  gas  are  perfectly  represented  by  the  motion  of  this 
2)oiiit  as  it  traces  a  straight  line  with  the  tmiforvi 
velocity  given  by  (1)  or  rebounds  elastically  from  one 
of  the  hypersurfaces  given  by  (2),  the  square  of  its 
velocity  being  constantly  given  by  (3). 
There  is  another  conclusion  of  a  more  complicated  nature 

which  can  be  drawn  from  equations  (4)  and  (5).     We  have 

l7i{UiViWi)]        L    ^{'tijVjWjUf^Vf^iUf^)    J 
so  that 

^  (A'/y/g/  ...x^'y^'z^'u(v^v\'  ...u.'v^Wn)  _  ^^ 

If,  then,  we  express  our  volume  element  in  671  dimensions 
as  we  do  in  3  dimensions,  we  see  that  the  transformations 
(4)  and  (5)  are  of  a  sort  to  preserve  volumes.  Suppose  that 
the  probability  that  a  point  in  this  space  should  lie  close  to 
an  assigned  position  be  proportional  to  a  quantity  diflfering 
by  an  infinitesimal  of  higher  order  from 

dx^dy^dz^ ...  dx^dy^dz^  ...  du^dv^dw^  ...  du^dv^diu^, 
the  element  of  volume,  then  after  any  time  there  is  an  equal 
probability  that  it  will  lie  equally  near  to  the  transformed 
position.     This  leads  to  : 

Assumption  1]  The  probability  that  a  gas  should  have  properties 
analytically  expressible  in  terms  of  the  6n  coordinates 
is  proportioned  to  the  volume  of  that  portion  of  tJie 
space  of  6n  dimensions  in  luhich  the  representing  point 
Tnust  then  lie- 
Our  work  in  geometrical  probability  shows  how  arbitrary 

this  assumption  is,  for  it  amounts  to  assuming  that 

*  Cf.  Borel,  *  Sui-  les  principes  de  la  theorie  cinetique  du  gaz ',  Annates  de 
I'Ecole  Nonnale,  Series  2,  vol.  xxvi,  1904,  p.  24. 
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are  the  natural  independent  variables,  and  that  the  element 
of  volume  is  given  by  the  differential  expression  above. 
From  this  assumption,  and  from  the  proof  that  the  large 
Jacobian  written  above  is  equal  to  1,  we  deduce: 

Theorem  2]  If  a  set  of  gas  molecules  be  such  that  at  a  time  t^ 
there  is  a  certain  i^rohability  that  it  luill  possess 
a  property  of  the  type  7)ientioned  in  Assumption  1] 
then  there  is  an  equal  probability  that  at  the  time  t^  it 
tvill  possess  the  transformed  property. 

It  is  scarcely  necessary  to  warn  the  reader  that  the  word 
*  probability '  as  used  in  this  chapter  must  be  understood  in 
the  statistical  sense  that  we  defined  in  Ch.  I  and  have  con- 
sistently used  throughout  the  present  work. 

§  3.    First  Deduction  of  Maxwell's  Law. 

Our  Theorem  2]  tells  us  that  the  probability  that  a  gas 
should  have  certain  property  is  proportional  to  the  volume 
of  a  certain  region  in  the  space  of  6n  dimensions.  If,  how- 
ever, this  property  have  to  do  merely  with  the  vector  velocities 
of  the  particles,  as  the  only  limitation  on  these  is  the  equation 
(3),  we  may  confine  ourselves  to  a  representation  in  a  space  of 
Sn  dimensions  only,  where  the  coordinates  are 

u^v^w^  ...u^v  w^. 

Now,  by  the  form  of  Assumption  1)  the  probability  that 
a  certain  gas  should  have  a  certain  property  is  proportional 
to  the  volume  of  a  certain  region  in  this  space  oi  Sn  dimensions, 
not  proportional  to  the  hyper-area  cut  by  the  region  from 
the  hyper-sphere  (3).     The  element  of  volume  is 

duj^dv^dwi ...  du^dv^dw^  =  (dUidv^^diVj)  ...  (dii^dv^dw^J, 

and  is  the  product  of  n  different  volume  elements  in  the 
three-dimensional  space  tt,  v,  iv.  Hence  the  probability 
sought  is  proportional  to  the  product  of  n  different  volumes 
in  three-dimensional  space.  These  n  points  we  shall  call 
'  velocity  points '. 

Suppose  next  that  the  three-dimensional  velocity  space 
u,  V,  w  is  divided  into  a  very  large  number  v  of  cells  of  equal 
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volume  V,  When  we  say  a  large  number,  we  mean  tliat,  as 
a  first  approximation,  the  coordinates  of  all  points  in  one  ceil 
may  be  taken  as  identical.  We  shall  assume,  however,  that 
n  is  so  large  that  it  is  well  above  y.  The  probability  that  the 
first  a^  points  shall  be  in  the  first  cell,  the  second  a^  in 
the  second,  the  last  a^  in  the  last  is  proportional  to 

and  since  we  have 

a^-k-a^+  ...  +  a^  —  n  (6) 

this  is  F^ 

On  the  other  hand,  the  probability  that  so')ne  a^  lie  in  the 
first  cell,  some  a^  in  the  second,  ii07)ie  a^  in  the  last  is  pro- 
portional in  this  niimber  multiplied  by  the  number  of  ways 
in  which  we  can  divide  n  objects  into  distinguishable  groups 
of  a^,  ttg,  ...  a^  respectively.  This  is  given  by  Ch.  II  (3),  so 
that  our  probability  is  proportional  to 

nl  V 
P  — (7) 

Our  fundamental  question  is  to  find  a  set  of  values 

subject  to  (3),  which  will  make  this  a  maximum. 

Since  P  isa  maximum  with  its  logarithm,  we  must  maximize 

log  (ii !)  -  log  K  !)  -  log  (a^ !)_...-  log  (a^ !). 

By  Stirling's  formula 

log  (r !)  =  (r  +  i)  log r-r  +  -J log  2 tt. 

Hence,  we  must  minimize 

i  =  V 

2  («*  +  ^)logai. 

i  =  l 

Assuming  all  of  the  a{s  are  large,  it  will  suffice  to  minimize 

1  =  1/ 


2  =  1 

i  =  V 


2 «*  =i '^^.    2  ^u {'^i' + ^'/  +  ^^'/)  =  ^^• 


?■ = 1  <• = 1 

2686  N 
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Strictly  speaking,  the  quantities  a^  are  integers,  but  we 
may  obtain  a  sufficiently  accurate  answer  by  treating  them 
as  if  they  were  capable  of  continuous  variation,  and  looking 
for  a  relative  minimum.     We  have  thus 

log  ai:=r  (Ui^  +  v/  +  tv^^)  -  1  +  s. 
In  view  of  (3)  we  must  expect  a^  to  decrease  as  u^  +  v^  +  iv"^ 
increases,  so  that  we  write 

«!  =  kie     (-     ^      I-       ^  . 
In  words,  this  tells  us  that  the  most  likely  distribution  of 
velocities  is  that  where  the  number  close  to  the  values  u,  v,  w 
is  proportional  to 

To  find  the  values  of  the  constants,  we  have,  first, 


n 


nk 


e  du 


e  dv 


e  dvj, 


This  process  is  very  much  the  sort  of  *  near  mathematics ' 
to  which  we  have  frequently  resorted,  for  we  have  integrated 
between  infinite  limits,  whereas  in  view  of  (3)  every  velocity 
component  must  be  less  than   /£'.     However,  since  the  value 

of  e~  "  is  very  small  for  every  large  n^  the  error  is  not 
serious.  To  find  h,  we  must  remember  that  the  number  of 
molecules  in  a  cell  is  about  n  times  the  probability  that 
a  molecule  should  be  in  that  cell,  hence 

7  3   poo       r«oo       r»oo 

V  V  TT/    J  -00  J-  OD.  -00 

r»30 


=  3n{-y-  )  ife  du 

VVTT/    J_oo  J 


e  dv 


c  dw 


2h' 


-7 


3  71 

2E 
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Theorem  3]  The  most  lyrohahle  distribution  of  velocities  anionrj 
the  iiwlecxdes  of  a  gas  is  that  where  the  number  of  those 
having  velocities  within  the  limits  u±^du^  v±^dv, 
iu±^dw  is  nearly  equal  to 


=7i 


^=      ■2E 

Here  n  is  the  number  of  molecules,  E  the  given  vis 
vivay  the  mass  of  each  being  taken  as  2. 

This  is  Maxwell's  law  for  the  distribution  of  velocities ; 
a  gas  in  this  condition  is  said  to  be  in  a  normal  state.  The 
equation  is  so  well  known  historically  that  we  reproduce  his 
original  proof.* 

'Let  N  be  the  whole  number  of  particles.  Let  u,  v^ic  be 
the  components  of  the  velocity  in  the  three  rectangular 
directions,  and  let  the  number  of  particles  for  which  w  lies 
between  u  and  u  +  du  be  Nf(u)du,  where  /  is  a  function  to 
be  determined. 

The  number  of  particles  for  which  v  lies  between  v  and 
v  +  dv  will  be  Nf(v)dv,  and  the  number  for  which  ic  lies 
between  lu  and  iv  +  dw  will  be  Nf(w)dw,  where  /  always 
stands  for  the  same  function. 

Now  the  existence  of  the  velocity  u  does  not,  in  any  way, 
affect  the  existence  of  the  velocities  v  or  u\  since  they  are  at 
right  angles  to  each  other  and  independent,  so  that  the 
number  of  particles  whose  velocities  lie  between  u  and  u  +  du 
and  between  v  and  v  +  dv,  and  also  between  lu  and  lu  +  diu,  is 

Nf  (u)  f  (v)f  {iv)  du  dv  diu. 

If  we  imagine  iY  particles  to  start  from  the  origin  at  the 
same  instant,  then  this  will  be  the  number  in  the  unit  of 
volume  dudvdw  after  the  unit  of  time,  and  the  number  per 
unit  volume  will  be       Nf  [u)f{v)f(iv). 

But  the  directions  of  the  coordinates  are  perfectly  arbitrary, 

*  Maxwell,  Collected  Works,  vol.  i,  pp.  380  flf.     Also  Jeans,  loc.  cit.,  p.  55. 

n2 
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and  therefore  this  number  must  depend  upon  the  distance 
from  the  origin  alone  ;  that  is 

f{u)f{v)f(w)  =  <l>(v,''  +  v^  +  w'). 

Solving  this  functional  equation,  we  find 

f{u)  =  Ce  (p  (u^  +  v^  +  w^)  =  U'^e      ^  \ 

This  simple  proof  is,  unfortunately,  very  unsound,  as  it  is 
not  at  all  clear  that  the  components  of  the  velocities  may  be 
treated  as  independent  variables,  and  in  fact,  if  one  component 
were  equal  to  the  square  root  of  the  vis  viva,  all  others  would 
have  to  be  equal  to  0.* 

^  §  4.    Amplification  of  the  Preceding  Proof. 

There  are  certain  points  in  the  deduction  of  the  Normal 
Law  of  Maxwell  which  deserve  more  careful  mathematical 
investigation:  it  is  now  time  to  return  to  them.  Logically, 
we  should  have  cleaned  up  everything  as  we  went  along,  but 
this  would  have  burdened  the  argument  to  an  unbearable 
extent. 

The  first  point  to  notice  is  the  fundamental  role  played  by 

Stirling's  formula.     Now  that  formula,  as  stated  in  Ch.  Ill, 

tells  us  that 

r!  1 

1  < ,  <  1  + 


hence  the  error  made  in  evaluating 

is  of  the  same  order  of  magnitude  as 

1/11  1  \ 

—  (-  +  -  +...+  -) 

or  as  ^/^^h- 

Both  of  these  quantities  are  large ;  it  is  not  clear  what  the 
nature  of  their  ratio  will  be. 

♦  Cf.  Bertrand,  loc.  cit.,  p.  30. 
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To  begiu  with,  it  should  be  noticed  that  there  is  no  harm 
done  if,  in  deducing  oui*  general  law,  we  reject  a  small 
number  of  molecules.  In  fact  we  may  reject  a  number  in- 
creasing with  n,  provided  that  it  does  not  increase  as  fast 
as  n.  V  depends  upon  the  number  of  cells ;  it  will  remain 
constant  if  the  total  volume  in  the  u,  v,  w  space  remain  fixed. 
But  we  have  no  right  to  assume  that  this  total  volume  does 
not  increase  with  n,  for  it  is  limited  only  by  (3)  and  the  vis 
viva  clearly  increases  with  the  number  of  molecules.  We 
shall  not  go  far  wrong  in  assuming  that  E  increases  pro- 
portionately with  n.  In  order  to  keep  the  volume  below 
a  certain  upper  limit,  we  may  have  to  reject  a  certain  number 
of  molecules  of  the  highest  velocity.  But  if  we  put  E  =  pn, 
we  see  that  the  number  of  molecules,  the  squares  of  whose 
velocities  are  greater  than  p,  cannot  increase  proportionately 
with  n ;  hence  the  number  rejected  will  increase  less  rapidly 
than  n,  and  do  no  harm.  We  may  choose  v  once  for  all,  and 
then  assume  that  all  of  the  a's  are  so  large  that  v/aj^  is  small. 

The  following  difficulty  is  more  serious.  If  we  take 
different  velocity  points  in  the  same  cell  of  u,  v,  w  space, 
their  coordinates  are  not  identical.  We  have  the  more  exact 
equation 

k=l  j  =  l 

We  may  imagine  that  every  increment  8jUj^,  ^j'^ki  ^j'^^'k  ^^^^ 

Q 

within  the  small  limits    +  -  ,  where  0  is  very  small,  thanks  to 

the  size  of  i/. 

The  point  in  the  space  of  3  n  dimensions  with  coordinates 

which  represents  a  distribution  of  velocities  in  v  cells,  will  lie 
in  a  3  71  dimensional  hypercube  of  edge  6.  The  quantity 
which  we  call  E  is  the  square  of  the  distance  of  this  point 
from  the  origin  and  the  total  variation  of  this  cannot  exceed 
the  length  of  a  diagonal  of  the  small  hypercube  0  V^n. 
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The  difference  of  the  distances  of  two  points  from  a  third 
point  is  less  than  their  distance  from  one  another ;  hence 


u-  = 


2  «/;  {Uk''  +  ^'A•'  +  '^0;')  ~^E  <eV3  71, 


/.  ■=  1 

I-  V 


2  "i.(V  +  V  +  '^^'/;')-(^^  ^2/<5v^3  7i)2  =  0, 


k-  1 


-1  ^yS  1. 

We  find,  as  before, 

ai  =  ke-"'<-"i*''''"'i\ 
h  \2 


Since  6  is  very  small,      /^-t»  is  a  good  value  for  h. 

If  §  5.    Probability  of  a  Nearly  Normal  State. 

We  have  seen  that  the  most  probable  distribution  of  velocities 
is  that  given  by  Maxwell's  law  for  the  normal  state,  given  in 
Theorem  3].  This  knowledge  does  not,  however,  carry  us 
very  far,  as  we  have  not  much  idea  how  likely  it  is  that  the 
distribution  of  velocities  in  a  given  gas  will  be  according  to 
this  law,  or  nearly  so.  This  difficult  problem  must  now 
claim  our  attention. 

Let  us  begin  by  shifting  slightly  the  number  of  molecules 
in  each  cell  bo  that  the  number  in  the  -^th  cell  is  now 


l  =  v 

2  «z  =  0,       2  «Z  W  +  ^^  +  ^^z')  *=  0-  (8) 


I  =  V  I  =  V 


1=1  ?=1 
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We  replace  (7)  by 

log  F'  =  (n  +  i)  log  n  +  n  log  V-  (^^)  log  27r 

1=1 

=  log  P  -  2  [«z  log  a^  +  (a;  +  az  +  ^)  log  (l  +  "^)  j 

=-logP-  2     ^l^^gH  +  <^l  +  Vz-  h 

nearly. 

Since  ai^ke-''"^''i^'i^''i^ 

by  (8), 

1  =  v  I  =  v 

2  «z  (log  az  +  1 )  =  2  «Z  [log  /^^  +  1  - /^'  (^^z'  +  vt"  +  u't"}]  =  0, 


1=1  1=1 

l  =  r 


logP'^logP-  2"^, 


2ai 

1  =  1       * 

a,  2 

F'  =  Pe"^^. 

We  wish  to  find  the  sum  of  the  values  P'  for  all  inteo^ral 
sets  of  values  a^.,.a^  compatible  with  (8)  and  with  a  con- 
dition of  size,  let  us  say 

l  =  v         „ 


2o^^^o'-  (10) 

We  seek  the  value  of 


iZ  2«!  -   2 


7r  =  P:5'e    '-^a- 


for  all  integral  values  of  a^^...  a^,  compatible  with  (8)  and  (10). 
It  is  clear  that  we  must,  first  of  all,  replace  our  summation 
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by  some  sort  of  definite  integral.     Let  the  number  of  groups 
of  as  be  N,  the  corresponding  values  of  P'  being i^',  i^', . . .  P/. 

i  =  N 

Consider  the  space  of  p  dimensions  where  a  point  has 
coordinates  a^,  a^,  ...  a^.  Equations  (8)  give  us  two  hyper- 
planes  in  this,  and  (10)  a  hyperellipsoid.  The  section  is 
a  hyperellipsoid  in  a  space  of  j/  —  2  dimensions  whose  volume 
we  call  W,  Let  each  point  within  this  hyperellipsoid  with 
integral  coordinates  be  enclosed  in  a  separate  region  of  volume 


dv;  then 


P/-1 


B'dv. 


where  the  integral  is  extended  throughout  the  whole  of  the 
region  including  the  point.  As  N  is  very  large  indeed,  and 
the  region  small,  this  is  close  to  the  value  obtained  by  re- 
placing P/  by  the  continuous  variable  P'.     Then 


approximately,  and  so 


P/-'l 
*  ~  W. 


^-w 


Fdv, 


Fdv, 


where  this  integral  is  taken  over  the  whole  of  the  hyper- 
ellipsoid.    We  may  re- write  this 


IT  = 


KP 

W  . 


c  "/  da-^da^...  da^,. 


l  =  v 


We  next  put 


2  f:  «i 


By  altering  z  we  have  a  set  of  concentric  ellipsoids.     If  the 
volume  of  one  of  these  be  written  f{z)y  then 


77  = 


NP 
W 


«0 

c — 
2 


e-'f  (z)  dz,     W  =  :  V  (z)  dz  =  /(|) 

Jo  ^  £t  ^ 


NP 


TT  = 


fiS)  ' 


e-'f  (z)  dz. 
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A  change  in  z  is  in  the  nature  of  a  central  similitude  in 
the  space  of  v  dimensions,  which  carries  each  of  the  hyper- 
planes  (8)  into  itself,  and  permutes  the  hyperellipsoids  in  this 
space.  Hence  it  permutes  also  the  hyperellipsoids  in  the 
space  of  J'  —  2  dimensions,  and  we  shall  have  a  relation 

f(z)  =  Kz~, 

as  we  prove  by  beginning  with  the  case  where  all  of  the  a's 
are  equal,  and  then  imposing  a  homogeneous  strain. 


/(t)=^^(!)^ 


7r  = ^  I    e  ^0 


,    ^    ^       dz. 

Jo 


Now  if  Zq  be  allowed  to  increase  indefinitely  we  shall 
eventually  include  all  sets  of  integral  values  of  a-^,  a^,  ...  a^ 
compatible  with  (8)  and  our  probability  tt  becomes  a  certainty. 
Moreover,  the  integrand  becomes  tiny  when  z  is  very  large, 
so  we  assume  that  we  get  certainty  by  integrating  out  to 

N 
infinity.     Lastly,  ^2   ^^  proportional  to  the  number  of 

integral  points  divided  by  the  volume  of  the  hyperellipsoid 
and  varies  little  from  a  fixed  constant  11.     Hence 


^-2     p 


1  =  ^-' 


2 


e-'z^   'dz, 
0 

p-2 


e-^.--a.  =  ^-^,prQ-^) 


7r=  I    e-'z^     dz.  (11) 


G  -  0 
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The  only  quantity  here  which  depends  on  n  is  z^,  for  we 
have  already  seen  that  v  may  be  chosen  once  for  all.  If 
n  increase  indefinitely,  we  may  expect  a^  to  increase  about  in 
proportion.  If  we  allow  a^  to  increase  in  about  the  same 
ratio,  then  Zq  will  increase  about  proportionately  with  n 
and  TT  will  approach  1  as  a  limit. 

We  can  express  this  more  accurately.  The  quantities  a^ 
are  those  which  give  the   most  likely   distribution,  the   a/s 

are   discrepancies,   and   the   ratios  — ^  =  3j    are   the   relative 
discrepancies.     We  have  the  inequality 

2?ft'^l-  (12) 

/  =  1  "0 

If  we  start  with  the  /S^'s,  then  from  (12)  we  may  assume  the 
a^'s  to  increase  proportionately  with  z^^. 

Theorem  4]  The  ]-)robahllity  that  the  distribution  of  velocities 
shall  differ  from,  the  normal  one  in  such  a  way  that 
each  relative  discrepancy  is  less  than  some  assigned 
quantity,  will  approach  1  as  a  limit  if  the  nwniher  of 
molecules  increase  indefinitely  and  the  vis  viva  increase 
proportioiiately  with  them. 

Theorem  5]  If  the  number  of  molecules  be  very  large,  and  if 
a  gas  he  taken  at  random,  it  is  p)ractically  certain  that 
the  distribution  of  velocities  will  differ  but  little  fro^n 
the  normal  one. 

Theorem  6]  7/  a  larg^  number  of  gas  specimens  be  examined, 
each  containing  the  same  number  of  molecules  of  the 
same  size  and  Tnass,  with  the  same  vis  viva  and  equal 
containers,  in  the  vast  Tnajority  of  cases  tJte  distribution 
of  velocities  will  differ  but  little  from  the  normal  one. 
We  have  stated  this  theorem  in  three  different  ways,  in 

order  to  contrast  them  with  another  statement  which  seems 

less  legitimate : 

'This  completes  our  information  about  the  motion  of  the 

gas.     At  any  instant  it   is  infinitely  probable  that  it  is  in 

the  normal  state.     In  the  course  of  the  motion  departures 

from  the  normal  state  will  occur,  but  it  is  infinitely  probable 
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that  these  will  occupy  but  an  infinitesimal  fraction  of  the 
time  occupied  by  the  motion.'  * 

This  conclusion  seems  unwarranted.  Returning  to  the 
representation  by  means  of  points  in  higher  space  each  repre- 
senting point  will  trace  a  trajectory  made  up  of  rectilinear 
segments,  followed  by  jumps  along  the  hypersurfaces  (2).  If 
a  large  number  of  representing  points  be  started  on  their 
journeys  at  the  same  moment,  a  large  majority  of  them  will 
always  be  found  in  regions  corresponding  to  normal  distri- 
butions of  velocity.  But  it  does  not  seem  at  all  clear  that 
the  paths  are  such  that  a  minority  of  points  may  not  stay 
most  of  the  time  in  regions  corresponding  to  abnormal  distri- 
butions of  velocity. t  Perhaps  the  best  statement  we  can 
make  is  the  following  :  J 

Theorem  7]  If  a  gas  S'pecimen  be  chose  a  at  random  from 
a  very  large  nuviber^  all  with  equal  containers  and 
equcd  vires  vivae,  it  is  immensely  probable  that  it  ivill 
have  a  nearly  normal  distribution  of  velocities  most  of 
the  time. 

§  6.    Distribution  in  Space. 

The  work  which  we  have  done  so  far  has  been  exclusively 
on  the  distribution  of  velocities ;  the  question  arises  naturally 
whether  we  may  not  carry  on  a  similar  discussion  of  the 
distribution  of  the  molecules  in  space. 

We  begin  by  replacing  our  coordinates  u,  v,  w  by  x,  y,  z. 
These  must  all  be  finite  since  the  gas  is  supposed  to  be  in 
a  finite  container.  This  container  we  may  imagine  divided 
into  a  number  of  equal  cells  as  before.  At  this  point  the 
analogy  breaks  down.  In  the  case  of  velocities,  starting 
with  the  assumption  that  the  probability  that  a  gas  was  in 
a  certain  state  was  proportional  to  the  volume  of  a  certain 
region  in  the  space  of  67i  dimensions,  we  noted  that  the 
limiting  conditions  (2)  do  not  involve  the  velocity  coordinates. 

*  Jeans,  loc.  cit.,  p.  55. 

t  Tliis  possibility  is  hinted  at  ibid.,  following  paragraph. 
X  The  discussion  of  these  points  in  Castelnuovo,  loc.  cit.,  pp.  290  if.,  is 
admirable. 
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Hence  the  probability  was  proportional  to  tbe  volume  of 
a  region  in  &  3n  dimensional  space  and  that  was  proportional 
to  the  product  of  the  volumes  of  n  regions  in  the  three- 
dimensional  space  u,  V,  w.  But  a  similar  line  of  reasoning 
is  not  applicable  to  the  x,  y^  z  coordinates,  for  they  appear  in 
(2).  The  matter  is  even  more  evident  on  purely  physical 
grounds.  The  velocity  coordinates  of  non-colliding  molecules 
are  totally  independent,  and  are  in  no  danger  of  'crowding' 
one  another,  but  the  fact  that  a  certain  molecule  lies  in 
a  certain  small  region  reduces  the  probability  that  a  second 
molecule  should  be  therein.  This  shows  the  illegitimacy  of 
the  '  assumption  of  molecular  chaos '  which  is  used  to  deduce 
Maxwell's  Law  from  dynamical  considerations.  This  assump- 
tion may  be  stated  as  follows  :  * 

'  It  is  usual  to  assume  that  the  molecules  having  velocity 
components  within  any  specified  limits  are,  at  every  instant 
throughout  the  motion  of  the  gas,  distributed  at  random, 
independent  of  the  positions  or  velocities  of  the  other  mole- 
cules, provided,  only,  that  two  molecules  do  not  occupy  the 
same  space.' 

The  matter  assumes  quite  a  different  aspect  when  we 
assume  that  the  diameters  of  the  molecules  are  negligible. 
Here  we  retain  only  those  inequalities  (2)  which  have  to  do 
with  the  container,  and  these  involve  the  positions  of  the 
molecules  separately.  Equation  (3)  drops  away.  We  may 
repeat  our  previous  reasoning :  where  the  quantity  called  r^  is 
equal  to  0,  the  a/s  will  all  be  equal. 

Theorem  8]  When  the  radii  of  the  molecules  are  negligible , 
the  inost  likely  distribution  is  a  uniform  one  through- 
out the  container. 
The  reasoning  previously  employed  to  find  P{  is  still  valid  : 

^  Theorem  9  J  When  the  radii  of  the  molecules  are  negligible^ 
there  is  a  very  great  probability  that  at  any  instant 
the  distribution  will  be  nearly  uniform  throughout  the 
container. 

^  Theorem  10]  When  the  radii  of  the  molecules  are  negligible, 
the  ausumption  of  w^olecular  chaos  is  legitimate, 

*  Jeans,  loc.  cit.,  p.  17. 


CHAPTER  XI 

THE  PRINCIPLES  OF  LIFE  INSURANCE* 

§  1.    Calculation  of  Life  Probabilities. 

The  fundamental  question  on  which  the  whole  theory  of 
life  insurance  is  based  is  the  probability  that  a  certain  in- 
dividual shall  survive  a  certain  time.  From  one  point  of 
view  the  statement  of  this  question  is  nonsense  on  its  face : 
our  times  are  in  the  hands  of  God ;  the  probability  does  not 
exist.  But  let  us  remember  that  from  the  very  start  we  have 
clung  closely  to  a  statistical  definition  of  probability ;  that 
definition  will  stand  us  in  good  stead  now.  The  problem 
means  essentially  this.  An  individual  is  classed  as  a  member 
of  a  recognized  category  of  a  sort  that  has  been  long  under 
observation.  What  proportion  of  that  category  may  we,  as 
a  result  of  statistical  inquiry,  expect  to  survive  the  time  in 
question  ? 

The  category  in  which  a  healthy  individual  is  usually  classed, 
for  purposes  of  life  insurance,  is  that  of  his  age.  The  funda- 
mental problem  can  be  put  in  the  following  more  exact  form  : 

'  What  is  the  probability  that  a  healthy  individual  of  age  x 
will  survive  one  year  ?  '  The  probability  that  he  will  survive 
two  years  is  the  product  of  the  probability  that  he  will 
survive  one  year,  multiplied  by  the  probability  that  a  person 
one  year  older  wiU  also  survive  one  year,  and  so  on  for 
a  number  of  years.  The  question  of  how  to  compute  these 
probabilities  statistically  is  ever  so  much  harder  than  one 
would  suppose  at  first.     One  would  be  inclined  to  say  :  '  Why, 

*  The  masterwork  on  the  subject  of  the  present  chapter  is  the  Institute  of 
Actuaries  Text-book,  Part  II,  by  George  King,  2nd  edition,  London,  1902.  See 
also  Czuber,  loc.  cit.,  vol.  ii. 
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all  you  have  to  do  is  to  take  a  census  of  a  large  number 
of  persons  of  age  £C  at  a  certain  date,  check  them  up  a  year 
later  to  see  how  many  are  alive,  and  form  the  quotient.' 
Unfortunately,  this  is  quite  impracticable.  To  begin  with, 
the  category  is  too  elastic.  If  at  a  certain  date  two  persons 
give  their  ages  as  x,  one  may  be  364  days  older  than  the 
other,  a  serious  divergence  in  the  later  ages.  In  fact  it  might 
be  wiser  to  assign  the  age  a;  —  1  to  the  one,  ov  x+\  to  the 
other.  Moreover,  unless  all  of  the  individuals  were  soldiers 
or  convicts  or  of  some  other  non-representative  sort,  it  is 
impossible  to  keep  track  of  them  all.  Some  will  escape 
observation  during  the  course  of  the  year,  and  it  will  be 
impossible  to  say  at  the  end  of  that  time  whether  they  are 
alive  or  dead. 

Difficulties  of  a  somewhat  different  sort  arise  when  we  try 
to  compute  the  probability  of  surviving  from  birth  and  death 
statistics.  If  a  man  be  born  in  the  year  1900  and  die  in  the 
year  1925,  he  may  die  at  the  age  of  24  or  25.  If  in  the  year 
1925  a  man  give  his  age  as  25  years,  he  may  have  been  born 
in  the  year  1899  or  the  year  1900.  If  a  man  born  in  1900 
die  at  the  age  of  25  years,  he  may  die  in  1925  or  1926. 

Further  complications  arise  for  an  insurance  company 
which  tries  to  calculate  life  probabilities  from  its  own  ex- 
perience. Suppose  that,  at  the  beginning  of  a  certain  year, 
the  number  of  persons  insured  of  a  given  age  is  known. 
A  year  later  the  books  of  the  company  are  re-examined  and 
the  number  of  persons,  ostensibly  a  year  older,  is  observed. 
The  ratio  will  by  no  means  give  the  probability  for  surviving 
one  year ;  the  tv/o  sets  of  figures  bear  on  different,  if  over- 
lapping, categories.  Among  those  who  appear  in  the  second 
count  are  some  who  did  not  appear  the  year  before,  because 
they  took  out  their  first  policies  during  that  year.  On  the 
other  hand,  of  those  whose  names  appear  the  first  year,  some 
will  disappear,  and  the  company  will  not  know  whether  they 
survive  or  not.  It  is  still  worse  if  the  company  make  use 
of  the  lists  of  deaths,  for  those  who  die  during  the  year  at 
the  same  age  will  have  been  born  in  two  different  years, 
and  will  appear  under  different  years  in  the  birth  registers. 
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and  some  will  have  taken  out  their  first  insurance  in  the 
course  of  the  year. 

It  is  evident  that,  in  view  of  all  these  difficulties,  no  perfect 
calculation  is  possible  ;  the  best  we  can  do  is  to  adopt  certain 
arbitrary,  if  plausible,  conventions.  To  begin  with,  different 
insurance  companies  combine  their  experience.  Some  of  the 
best  life  tables  are  those  of  twenty  British  companies,  and 
the  large  American  companies  combine  their  experience  also. 
Secondly,  statistics  are  made  up  as  of  January  1,  but  each 
individual  is  given  a  fictitious  birthday,  the  1st  of  July 
nearest  to  the  date  of  his  actual  birth.  This  corresponds  to 
the  tolerably  reasonable  assumption  that  those  who  announce 
a  certain  age  on  January  1  have  their  birthdays  scattered 
pretty  evenly  over  a  twelvemonth.  In  the  same  way,  it  is 
assumed  that  all  who  take  out  or  surrender  their  policies 
during  a  year,  do  so  on  July  1.  Let  L^  be  the  number  of 
persons  aged  x  whose  names  appear  on  the  company's  books 
on  the  1st  of  January,*  the  number  giving  their  age  as  a;+  1 
a  year  later  shall  be  L^+i*  Let  pa;  ^^  the  chance  that  a  person 
aged  X  will  survive  one  year.  Let  ix  be  the  number  of  new 
policy-holders  aged  x  who  enter  during  the  year,  e^  the 
number  who  left.  The  probability  of  surviving  half  a  year 
will  be  about         1  - 1(  1  -y^)  =  i  (1  +  p^). 

Now  Xa;+j  is  made  up  of  the  survivors  of  L^  plus  the 
surviving  immigrants,  and  less  the  surviving  emigrants,  i.  e. 

^+,  =  LxPx  -  ~  (1  +Px)  +  ^1  (1  +Px). 

^x      2  y^x     ^x) 

It  must  not  be  imagined  that  after  the  various  ^^^'s  have 
been  calculated  in  this  way  the  results  are  in  final  shape.  It 
is  clear  that  no  one  will  be  perfectly  accurate,  not  even  the 
best  value  obtainable.  In  fact  if  we  plot  each  p^  ^-s  an 
ordinate  corresponding  to  the  abscissa  x,  we  have  points  of 

*  The  notation  used  throughout  this  chapter  is  the  universal  one  adopted 
at  the  second  International  Actuarial  Congress,  London,  1898. 
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a  broken  line  that  waves  up   and  down.     The  next  step  is 
to  '  graduate '  these  results,  and  consists  essentially  in  altering 
the    ordinates   slightly   so   that  the  resulting   points  lie   on 
a  smooth  curve,  which  sinks  continuously  after  the  years  of 
early  childhood.     This  graduation  may  be  accomplished  in 
a  large  number  of  ways.     We  may  replace  the  middle  one 
of  each  triad  of  successive  points  by  the  centre  of  gravity  of 
the  triangle  with  these  points  as  vertices.     This  '  will  bring 
down  the  mighty  from  their  seat,  and  exalt  the  lowly  and 
meek'.     If  need  be,  the   process   may    be   repeated   several 
times.     Another  plan  is  to  take  a  number  of  points  greater 
than  three  and  find,  by  least  squares,  the  parabola  of  vertical 
axis  which  lies  nearest  to  them.     A  very  simple  way  is  to 
plot  the  points  and  then  run  a  smooth  curve  as  near  them  as 
possible  with  the  aid  of  a  spline  or  some  other  instrument. 
Practical   actuaries   seem    to   find   this   method   as   good   as 
any  other.* 

An  ideal  way  to  calculate  the  probability  of  survival  would 
be  to  find  an  explicit  function  for  'p^.  Various  attempts  have 
been  made  to  find  some  such  function,  the  most  successful 
being  that  of  the  English  actuary  Makeham,  whose  method 
we  shall  now  explain.f 

Let  Ix  be  the  number  of  persons,  all  born  at  practically  the 
same  time,  who  reach  the  age  x.  The  probability  of  surviving 
one  year  is  7 

P.  =  -f^-  (2) 

Let  —  AZa;  be  the  number  of  persons  of  the  category  x 
who  die  in  a  short  space  of  time  ^x  thereafter.  Then  the 
instantaneous  deatli-rate,  called  the  *  instantaneous  force  of 
mortality ',  is 

^    ^  lim   ^lx^_d}ogh,  /3) 

^^      A.-+0  l^^x  dx 

According  to  Makeham's  assumption,  death  will  arise  from 
one  of  two  general  causes.     The  first  is  accident,  and  may  be 

*  Cf.  Czuber,  loc.  cit.,  vol.  ii,  pp.  167-200. 

+  Makeham,  Journal  of  the  (British)  Institute  of  Actuaries,  London,  Jan.  1860, 
Unfortunately,  I  have  not  been  able  to  verify  this  reference. 


CALCULATION    OF   LIFE    PROBABILITIES  193 

looked  at  as  a  constant  throughout,  for  younger  men  are 
more  active  than  old  ones,  and  have  greater  recuperative 
power,  but  also  take  more  risks.  The  second  is  decrease  in 
power  to  resist  disease.  If  we  overlook  the  accidental  deaths 
for  a  moment,  we  might  fairly  assume  that  the  rate  at  which 
people  were  dying  at  any  instant  was  inversely  proportional 
to  a  function  /  (x)  which  represents  the  force  of  resistance  to 
disease.     Hence  d 


With  regard  to  f{x)y  Makeham  assumes  that  in  any  short 
interval  a  man  loses  a  constant  proportion  of  such  force  of 
resistance  as  he  still  has. 

We  now  have  the  data  necessary  to  calculate  the  number 
of  living  ^a-;  we  change  constants  at  pleasure  throughout  our 
integration 

df{x)  , 

fix)  =  re~P^ 

fi^  =  A  +  Bc^, 
\ogl^^-Ax-Dc^-F, 

l^  =  ks^g^<^').  (4) 

2^^  =  8f/^^')  (^-^).  (5) 

A  simpler  formula  was  devised  some  time  earlier  by 
Goinpertz.*  He  overlooked  the  element  of  chance,  and  tliere- 
fore  made  ^  =  0  and  s  =  1,  thus  getting 

We  may  find  the  values  of  the  constants  in  Makeham's 
formula  from  four  observations  as  follows : 

log  l^  =  log  k  +  x  log  s  +  c^  log  g, 
log  Ix+t  =  ^ogk+{x-h  t)  log  s  +  c^  c^  log  (/, 
log^x+2<  =  log^+(^  +  2^)logs  +  c2«c^log5r, 
log  lx-^u  =  log  k  +  {x  +  Zt)\ogs  +  c^^c''  log  g. 

*  Gompertz,  '  On  the  Nature  of  the  Function  expressive  of  the  Law  of 
Human  Mortality',  Philosophical  Transactions,  Royal  Society,  1825. 

2686  O 
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The  first  differeuces  are 

A  log  l^  =  t\ogs  +  {c*-l)  ("'  log  g, 

A  log  lx+t  =  i  log  8  +  c'  (c'  —  1 )  c^  log  g, 

A  log  lx+2t  =  ^  log  «  +  c^'  (c^  —  1 )  c^  log  g. 

The  second  differences  are 

A2loglx  =  (r/-lfc''logg, 

A.^logl-c+f  =  c'(c^-l)2c^log^. 

log(AjogZ.c+/)-log(Ajogy  =nogc. 

The  other  constants  are  then  easily  found. 
Another,  and  better,  plan  is  to  use  all  available  data  and 
determine  the  constants  by  least  squares.     We  write 

^ogpx  =  log  s  +  f •'•  (c  -  1)  log  g. 

The  constants,  of  which  logs  and  (c— l)log^  appear 
linearly,  may  be  found  by  the  methods  explained  in  Ch.  IX.* 

How  much  confidence  should  we  place  in  Makeham's 
formula?  It  is  evident  that  the  assumptions  on  which  it  is 
based  are  nothing  more  than  reasonably  plausible.  The  test 
is  whether  it  really  checks  up  in  practice.  This  is  the  case 
to  a  really  surprising  extent.  A  life  table  calculated  by 
Makeham's  formula  is  better  than  any  but  the  very  best 
table  calculated  by  other  means.  This  is  strikingly  brought 
out  by  the  following  figures,f  where,  unfortunately,  we  have 
available  not  Makeham's  formula  but  the  less  accurate  one 
of  Gompertz.     The  values  tabulated  are  for  Ix'. 


X 

Gompertz 
Formula. 

20  British 
Co.'s. 

Duvillard. 

Deparcieux. 

Northampton. 

30 

890 

890 

890 

890 

890 

40 

839 

813 

750 

797 

737 

50 

745 

718 

603 

704 

579 

60 

584 

584 

434 

561 

413 

70 

355 

382 

238 

375 

250 

80 

125 

142 

71 

143 

95 

If  w^e  assume  that  the  best  tables  are  those  of  20  British 
Companies,  we  see  that  this  table  shows  that  the  Gompertz 
figures  are  as  accurate  as  Deparcieux,  and  distinctly  better 

*  Czuber,  loc.  cit.,  vol.  ii,  pp.  181  ff. 
t  Bertrand,  loc.  cit.,  p.  818. 
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J 'than  Duvillard  or  Northampton.  As  a  matter  of  fact, 
Makeham  tables  are  used  in  practice.  As  for  the  Gompertz 
formula,  that  is  useful  for  calculating  the  probabilities  for 
contingencies  depending  on  two  lives.  We  see  from  (6)  that 
the  probability  that  a  person  aged  (x)  and  another  aged  x^ 
should  both  survive  is  Py,  where 

This  formula  would  not  hold  for  two  persons  intimately 
connected,  like  husband  and  wife,  for  the  death  of  one  would 
be  likely  to  hasten  the  death  of  the  other. 


§  2.    Endowments  and  Annuities. 

Before  taking  up  the  subject-matter  of  the  present  section, 
we  must  say  a  word  or  two  about  the  mathematics  of  finance. 
In  calculating  all  sorts  of  insurance  values,  it  must  be  under- 
stood that  the  word  *  interest '  always  means  '  compound 
interest '.  If,  thus,  i  be  the  rate  (usually  in  the  neighbour- 
hood of  3^  per  cent.),  the  amount  of  Si  at  the  end  of  n  years  is 

To  find  the  present  worth  of  |l  payable  at  the  end  of  that 
time,  we,  write  (1  +i)~^  =  v  ;  the  present  worth,  or  discounted 
value,  is  v'^. 

With  regard  to  calculating  compound  interest,  we  note  that 
if  the  interest  be  compounded  7n  times  per  year,  the  amount 
at  the  end  of  n  years  will  be 

(n    \  11171 
1+  -)     • 

Allowing  r)i  to  increase  indefinitely,  the  amount  at  interest 
continuously  compounded  is  e^^;  if  we  put  e*'  =  (1  +^)  we  see 
that  compound  interest  or  present  worth  can  be  reckoned  by 
assuming  the  new  rate  i'  with  continuous  compounding. 
This  new  rate  is  called  the  *  force  of  interest '. 

A  sum  of  money  to  be  paid  at  the  end  of  a  certain  time, 
provided  that  a  stated  individual  is  still  alive,  is  called  an 
'  endowment '.     What  is  the  present  value  of  $1  payable  at 

02 
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the  end  of  n  years,  in  case  a  person  now  aged  x  is  then  alive  ? 

Every  practising  actuary  is  provided  with  a  series  of  tables 
called  '  commutation  tables '  which  contain  the  fundamental 
data  needed  for  his  purposes.*  The  first  column  in  such 
tables  contains  the  age  x,  the  second 

D:c  =  VH,,  (7) 

60  that  the  fundamental  endowment  formula  is 

A  sum  of  money  which  shall  be  paid  at  the  end  of  each 
year  that  an  individual  survives  is  called  an  *  Annuity' ;  the 
value  of  an  annuity  of  $1  based  on  the  life  of  an  individual 
aged  a;  is  7)      j.  n      4. 

Sometimes  it  is  required  that  the  first  payment  shall  be 
made  immediately.  In  that  case  the  sum  is  called  an 
'  Annuity  due ' ;  the  Germans  have  more  sonorous  titles,  calling 
them  annuities  '  postnumerando '  and  '  praenumerando '  re- 
spectively. In  the  third  column  of  the  commutation  tables 
are  the  quantities 

Nx=    J,    D.^i,  (9) 

i  =  l 

where  (o  is  the  age,  usually  about  100,  where  an  individual 

may  reasonably  be  supposed  to  be  certainly  dead.  We  have, 
then,  for  an  annuity 

a^^NJD.^,  (10) 
and  for  an  annuity  due  f 

\^a^  =  N,_JI),,  (11) 

^  It  is  interesting  to  compare  the  value  of  a^  with  that 

*  Cf.  King,  loc.  cit.,  pp.  512-45. 

t  This  is  the  standard  notation.     Some  authors,  as  Czuber,  loc.  cit,  write 

i  =  w— a; 

'^x  =   2  ^a:+'  ^^^  "x  where  we  write  1  +  03..     Thus  they  write  (10).  meaning 

i  =  0 
(ti). 
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of  a  certain  payment  to  be  made  yearly  during  the  season 
exi  the  expected  life  of  the  individual,  i.e.  the  mean  life 
of  one  of  his  age, 

^^^  Wrti^+^±_Lii±^.  (12) 

The  value  of  the  certain  payment  is 

■ 

=  l[i-(i+ir']. 

We  wish  to  compare  this  with 


a^  = 


To  prove  that  this  is  less  than  the  other,  we  must  show 
The  right-hand  side  is  greater  than 

=  ye.  +  1  fe^  -  ^•^■^I'^^-^-^^+'-'+^^  +  ^xl 


Problem. 

Find  the  value  of  |„cra,  an  annuity  limited  to  n  payments,  of  m|«x  ^^ 
annuity  whose  first  payment  will  be  at  the  end  of  m  j'ears,  and  of  m\nOx 
which  is  both  limited  and  postponed. 
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The  inequality  is  thus  established,  and  the  certain  pay- 
ment has  the  greater  value. 

It  is  sometimes  important  to  know  the  value  of  an  annuity 
which  increases  each  year.  Let  the  first  payment  be  $1,  the 
second  $2,  the  third  $3,  and  so  on.  We  have  here,  really,  an 
annuity  for  $1,  another  for  the  same  sum  deferred  one  year, 
a  third  deferred  two  years,  and  so  on.     The  value  is  thus 


L 


+  7)^.^3+...- 


The  fourth  column  in  the  commutation  table  is 

t  =  CO— X 


t  =  Q 


The  value  of  our  complicated  annuity  is  then 

^x/^x* 


(13) 


(14) 


§  3.    Single  Payment  Insurance. 

There  are  two  sorts  of  benefits  which  a  Life  Insurance 
Company  is  called  upon  to  pay:  annuities  if  people  survive, 
insurance  if  they  die.  We  have  calculated  the  values  of  the 
principle  types  of  the  former  benefit ;  we  must  now  calculate 
the  latter.  In  passing,  we  note  that  whereas  when  a  man 
wishes  to  take  out  an  insurance  policy  he  must  pass  a  careful 
physical  examination,  and  give  evidence  that  he  does  not 
follow  an  unusually  dangerous  calling,  when  it  comes  to 
annuities,  the  worse  a  man's  health  and  the  more  dangerous 
his  calling,  the  better  the  Company  will  be  pleased. 

What  is  the  present  value  of  $1  to  be  paid  at  the  end  of 
the  year  in  which  a  person  aged  x  dies  ?  We  call  this  Ax 
and  note  that  we  have  various  mutually  exclusive  possibilities 
that  he  may  die  in  any  one  of  the  succeeding  years. 
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k 

+ 


This  formula  might  have  been  predicted  by  the  followintr 
reasoniog.  The  Company  agrees  that  if  the  man  be  alive  at 
the  beginning  of  any  year  it  will  pay  $1  either  to  him  or  to 
his  '  heirs  or  assigns '  at  the  end  of  the  year.  The  man 
agrees  that  if  he  be  alive  at  the  end  of  the  year,  he  will  pay 
that  dollar  back.  The  man  agrees  to  pay  the  Company  an 
annuity  of  $1,  the  Company  agrees  to  pay  an  annuity  due 
for  the  same  amount,  but  as  each  payment  is  postponed  a  year 
the  whole  must  be  discounted  once.  The  difference  between 
these  two  benefits  gives  the  formula  above.  It  must  be 
added,  however,  that  this  is  not  the  best  type  of  formula  for 
computation.  We  shall  add  to  our  commutation  table  columns 
based  on  those  who  die,  not  on  those  who  survive.  The 
number  who  die  at  the  age  x  is 

^x  =  -r  +  —7^+... 


Let  vy+hly  =  Cy,  (16) 


t  =  lx)  —  X 


^x=     ^    C^^t,  (17) 


t  =  o 


A^  =  ]\J^/B-^.  (18) 

Let  us,  lastly,  suppose   that  the  amount  of  the  insurance 

will  be  $1    if   the  man  die  in  the  first  year,  $2  if   in   the 
second,  and  so  on.     We  have 
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•We  make  a  last  column  in  our  commutation  tables, 

t  —  oi  —  X 

t  =  0 

the  value  of  this  increasing  insurance  is  then 

Rx/D^-  (20) 

A  much  more  frequent  form  of  contract  is  the  so-called 
'endowment'  policy,  where  the  Company  agrees  to  insure 
a  life  for  n  years,  and  to  pay  the  amount  at  the  end  of  that 
time  in  case  the  person  is  still  alive.  This  is  clearly  the  sum 
of  an  insurance  limited  to  a  certain  number  of  years,  and  an 
endowment  postponed  the  same  number  of  years,  namely 

^xn\  =  (M^-'M^^n  +  Dx+n)/D:c'  (21) 

This  may  be  transformed  in  a  number  of  ways  which  we 
shall  not  stop  to  explain. 

We  have,  so  far,  assumed  that  the  insurance  would  be  paid 
at  the  end  of  the  year  of  death ;  that  is  not  an  arrangement 
which  usually  commends  itself  in  practice,  most  beneficiaries 
not  caring  to  wait  so  long. 

Suppose  that  an  annuity  of  $1  is  to  be  paid  in  m  equal 
instalments.  Its  value  is  increased,  partly  because  the 
beneficiary  receives  his  money  earlier  each  year,  partly  because 
he  receives  some  payments  in  the  year  of  death.  The  present 
value  of  the  last  payment  to  be  made  each  year  is  ax/yr^- 

An  annuity  due  of  l/77i  payable  at  the  beginning  of  each 
year  would  halve  the  value 

Let  us  assume  that  the  intervening  benefits  decrease  pro- 
portionately ;  the  total  value  of  the  annuity  is  now 
1  1  /  1\        1   /  2 \ 

—  «x+  —  (C^a;+  —  )  +   —  (aa;+  —  )  +... 

^  ^        2m  ^     ' 


Problem. 

Calculate  the  value  of  mi-^x  ^^i  insurance  where  the  liability  begins 
only  after  m  years,  of  m-^o;  where  it  ceases  after  n  years,  and  of  min-^x 
where  it  is  limited  both  ways. 


SINGLE   PAYMENT   INSURANCE  201 

For  a  continuous  annuity  we  should  have 

ax  =  ax  +  i.  (23) 

In  the  same  way,  an  insurance  policy  receives  an  enhanced 
value  if  the  payment  be  made  at  the  end  of  a  stated  fraction 
of  the  year,  say  l/mth,  in  which  death  occurs,  for  the  Company 
loses  interest  on  its  money  from  the  end  of  that  term  to  the 
end  of  the  year.  Assuming,  for  simplicity,  that  the  probability 
of  death  is  the  same  throughout  all  intervals  of  the  year, 
a  rather  inaccurate  assumption,  and  that  the  interest  lost  is 
only  simple  interest,  the  loss  to  the  Company  is 

.     1  fm— 1  .      7n  — 2  .  1    ."I        ,    'r>i  — 1  . 

hence  the  value  under  the  present  contract  is 

For  immediate  payment  at  death,  we  have 

A^  =  A:,(l+i/2y  (25) 

§  4.    Premiums. 

In  all  the  calculations  made  so  far  we  have  merely  con- 
sidered the  present  value  of  the  benefit  to  be  obtained.  In 
the  majority  of  cases,  however,  the  beneficiary  is  by  no  means 
in  a  position  to  pay  down  at  once  the  full  value  of  his  benefit, 
but  arranges  for  payments  at  stated  intervals.  Suppose,  for 
instance,  instead  of  making  a  single  payment  for  a  simple 
life  policy,  the  beneficiary  wishes  to  make  equal  annual 
payments,  beginning  immediately,  as  long  as  he  lives.  What 
he  undertakes  to  do  is,  thus,  to  pay  to  the  Company  an 
annuity  due  for  the  amount  of  the  premium  ij,  and  this  must 
have  the  same  present  value  as  the  insurance,  hence 
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The  premium  for  a  policy  payable  immediately  at  death  is 

When  the  beneficiary  wishes  to  limit  himself  to  at  most 
n  payments,  we  have 

I  n^  (^  "^  In,  ^x)  ^^  -"a;» 

N      —  V 

The   annual    premium   for    an   n-ye&r   endowment    policy 
will  be  n  M  _M       A-T) 


nPx  —  IvT V  (2*^) 


xn\ 


N      —N 

A  not  uncommon  practice  in  the  case  of  both  insurance 
and  endowments  is  to  arrange  that  the  premium  or  premiums 
shall  all  be  returned  with  the  benefit.  It  is  not  quite  clear 
why  any  one  should  desire  this  type  of  policy,  except  that 
it  has  the  appearance  of  giving  the  beneficiary  something  for 
nothing,  which  is  always  popular.  Let  us  begin  with  the 
simplest  case,  and  find  the  single  premium  for  an  endowment, 
which  shall  give  the  beneficiary,  if  alive,  the  sum  of  $1  plus 
the  premium.     We  have 


TT.^^^^;;   •    '  (29) 

What  will  be  the  single  premium  for  a  simple  life  policy, 
premium  to  be  returned  with  the  insurance  ? 

(l+7r^)34 


TTr.  = 


TT,,  = 


J^X  ' 

M 

^^^  (30) 


n^-M^ 


If  immediate  payment  be  required,  we  must  multiply  M^ 


Problem. 

Calculate  the  increased  cost  of  P^Tt  for  immediate  payment  at  death. 
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by  1  -f  - .     Single  premium  for    7i-year  endowment  policy, 
premium  to  be  returned 


TTx  =  (l+TTa:) 


D 


X 


Annual  premium  on  simple  life  policy,  all  premiums  to  be 
returned.  Here  the  payment  side  is  an  annuity  due  of  the 
amount  of  the  premium.  The  benefit  side  is  two  policies,  one 
for  $1,  the  other  of  increasing  amount,  starting  with  the 
premium,  and  increasing  by  that  amount  every  year.  We 
get  from  (11),  (18),  and  (2X)) 


^X  /««x 

-'■^  x-i      -^^x 

Let  us  look  a  little  more  closely  into  the  wisdom  of  stipu- 
lating that  the  premiums  shall  be  returned.  Let  us  take  this 
last  case  of  simple  life  policy,  premium  to  be  returned.  The 
beneficiary's  expectation  is  here 


X  —J i-  TTx  -, T  .  . .  —    1  -r  TT^f^xi 


l+7r.,4^  +7rx4^2+...=  l+TT^e^, 


the  ratio  of  benefit  expected  to  premium  is,  by  (32), 


The  ratio  of  benefit  expected  to  premium,  when  premiums 
are  not  returned,  is  j^      /^  ^ 


Problems. 

1.  Find  single  premium  for  n-year  endowment  policy,  with  return  of 
premiums,  if  payment  be  made  immediately  after  death. 

2.  Find  premium  for  n-year  endowment   policy,   all   premiums  to   be 
returned. 
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To  compare  these,  we  must  compare  R^/Mj^  with  e^, 
Turoing  to  a  3^  per  cent,  commutation  table,  we  find  the 
figures : 


25 

30 

29 

40 

22 

20 

60 

13 

9 

As  the  ratio  of  expected  benefit  to  premium  is  greater  in 
the  simple  case  than  where  premiums  are  returned,  the  former 
would  seem  to  be  the  better  for  the  beneficiary. 

At  this  point  it  is  necessary  to  emphasize  in  the  strongest 
terms  the  fact  that  the  premiums  which  we  have  calculated, 
differ  very  widely  from  those  charged  in  practice  by  com- 
mercial insurance  companies.  These  net  premiums  fail  to 
provide  any  reserve  to  meet  the  following  contingencies : 

(1)  Cost  of  operation,  and  interest  on  capital  invested. 

(2)  Fluctuations  in  the  death-rate. 

(3)  Fluctuation  from  the  theoretical  number  of  deaths, 
according  to  Bernoulli's  theorem. 

(4)  Decrease  in  rate  of  interest  obtainable  on  invested  funds. 
In  order  to  meet  these  various  contingencies,  the  premiums 

are  usually  '  loaded  '  to  a  greater  or  less  extent.  The  different 
companies  do  not  announce  to  the  world  the  different  bases 
which  they  take  for  calculating  this  loading,  and  this  reticence 
is  very  natural,  but  the  result  is  a  rather  remarkable  diversity 
in  practice.  As  an  example,  we  quote  a  few  figures,  the 
supposed  age  of  the  insured  being  35  years : 

Net  Premium.     P.D.Q.  Company.     X.Y.Z.  Company. 

20  payment  life  0.0311  0-03328  0-03834 

20  year  endowment  0-0422  0-0467  0-05147 

The  net  figures  are  calculated  from  a  3^  per  cent,  com- 
mutation table,  the  others  from  a  card  published  by  the 
P.  D.  Q.  Company  showing  how  much  less  its  rates  were  than 
those  of  some  score  of  competitors.  The  X.  Y.  Z.  was  chosen 
for  comparison  because  of  its  high  premiums  and  great  size. 
The  great  difference  in  the  premiums  is  doubtless  explained 
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in  large  measure  by  differences  in  systems  of  loading.  Thus, 
some  companies  follow  the  plan  of  loading  the  first  premiums 
very  heavily,  then  dividing  large  slices  of  profit  among  the 
policy-holders.  Insurance  companies  doing  this  have  a  habit 
of  employing  such  adjectives  as  *  mutual '  or  '  co-operative '  to 
describe  themselves.  We  quote  from  the  card  where  these 
figures  are  found : 

*  The  P.  D.  Q.  Company  is  distinguished  for  low  rates  of 
premium  on  all  forms  of  insurance,  also  for  low  expense  rate 
and  its  mortality,  since  organization  is  lower  than  that  of 
any  other  American  Company  for  a  like  period.  All  of  its 
policies  are  on  the  "  participating  plan  ",  that  is,  the  diflference 
between  the  premium  and  the  cost  of  insurance  is  determined 
by  experience,  and  returned  to  the  policy-holder.' 

The  only  insurance  system  with  which  the  writer  is  familiar, 
where  only  net  premiums  seem  to  be  charged,  is  the  United 
States  War  Risk  Insurance. 


§  5.    Surrender  Values. 

At  the  moment  when  an  individual  takes  out  an  insurance 
policy,  his  mathematical  expectation  is  0,  that  is  to  say,  the 
sum  which  he  expects  to  pay  in  net  premiums  has  the  same 
present  value  as  the  benefit  looked  for.  As  time  goes  on  this 
equation  ceases  to  hold.  Tlie  expected  benefit  is  greater  than 
the  expected  outlay,  and  it  would  be  increasingly  advantageous 
to  the  Company  for  him  to  cancel  the  contract.  The  differ- 
ence between  what  the  Company  expects  to  receive  from  the 
premiums  stipulated  for  in  the  past,  and  what  it  would 
expect  from  an  individual  of  the  same  age,  insuring  himself 
for  the  first  time,  is  called  the  '  Surrender  value ',  and  is  about 
the  sum  which,  in  practice,  a  Company  is  willing  to  pay  to 
a  policy-holder,  after  the  first  few  years,  in  return  for  giving 
up  his  insurance.     It  can  be  calculated  in  various  ways. 

Suppose  that  an  individual  aged  x  +  n  took  out  a  simple 
life  policy  at  the  age  x.  The  surrender  value  will  be  the 
difference  between  the  value  of  a  new  policy  for  a  man  of  his 
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present  age  and  the  value  of  an  annuity  due  of  the  amount 
of  his  present  premium,  namely, 

M  N  M 

M       K      —M  N 
_ Tj — N ^  ^ 

This  method  of  calculating  is  called  the  '  prospective  method '. 
It  is  interesting  to  compare  it  with  the  *  retrospective  method  *, 
which  may  be  explained  as  follows. 

At  the  time  when  the  contract  was  first  made,  the  pros- 
pective value  of  the  first  n  payments  was  (\ -\- \  ^_^ax)  Px' 
These  payments-had  two  functions :  to  provide  for  a  temporary 
insurance  for  n  years,  and  to  provide  the  surrender  value  at 
the  end  of  that  time.  The  difference  between  the  limited 
annuity  due  and  the  temporary  insurance  is  the  surrender 
value,  multiplied  by  the  probability  that  the  individual  will 
survive  n  years,  and  discounted  for  n  years,  i.e.  an  n  years 
endowment  to  the  amount  of  the  surrender  value.  We 
thus  have 

v^x    D      '^  I)  AT  B 


M       N      —MN 


^A-i 


■XT    _  -"^x+n-^  x-^       ■'^^x-^ x  +  n-\  ,  /q^N 

n^x~  n        isr     -  '  ^     ' 

■^x+n-^^  x-1 

As  a  matter  of  fact,  policy-holders  do  not  usually  surrender 
their  policies,  and,  in  consequence,  a  large  insurance  company 
is  obliged  to  have  continually  on  hand  a  very  large  reserve. 
This  great  sum  of  money  gives  to  the  Company  much  impor- 
tance in  the  world  of  finance.  Moreover,  there  is  rather  a  nice 
ethical  question  as  to  who  is,  in  reality,  the  owner  of  this 
reserve,  and  this  question  is  by  no  means  of  merely  academic 
interest,  for  it  was  once  raised  in  a  big  lawsuit  involving  one  of 
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the  largest  of  the  American  companies.  The  policy-holders 
maintained  that  the  reserve  was  really  the  totality  of  surrender 
values,  and  so  belonged  to  them,  or  at  least  they  should  have 
a  voicQ  in  determining  how  it  should  be  managed.  The 
directors  of  the  Company  contended  that  as  long  as  the 
institution  was  in  a  sound  financial  condition,  and  of  this 
there  was  never  the  slightest  question,  and  as  long  as  they 
met  all  of  their  obligations  with  reasonable  promptness,  it 
was  nobody's  business  but  their  own,  what  they  did  with  the 
reserve.  This  line  of  reasoning  would  seem  flawless,  were 
it  not  for  the  allurement  of  mutuality  or  co-operation  which 
many  companies  hold  out  to  prospective  policy-holders.  Just 
how  much  right  has  a  policy-holder  in  a  mutual  company  to 
a  voice  in  its  management?  Questions  of  this  sort  are  in- 
teresting and  important,  but  can  hardly  be  said  to  fall 
naturally  under  the  head  of  mathematical  probability. 
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TABLES 


Table  A 


The  Common  Logarithms  of  e^  and  e 


X 

logio  «* 

logjoc-* 

X 

logiofe* 

logio^-^ 

0.00001 

0.0000043429 

i.9999956571 

0.08000 

0.0347435586 

1.9652564414 

0.00002 

0.0000086859 

1.9999913141 

0.09000 

0.0390865034 

1.9609134966 

0.00003 

0.0000130288 

1.9999869712 

0.10000 

0.0434294482 

1.9565705518 

0.00004 

0.0000173718 

1.9999826282 

0.2C000 

0.0868588964 

1.9131411036 

0.00005 

0.0000217747 

1.9999782853 

0.30000 

0.1302883446 

1.8697116554 

0.00006 

0.0000260577 

1.9999739423 

0.40000 

0.1737177928 

1.8262822072 

0.00007 

0.0000304006 

1.9999695994 

0.50000 

0.2171472410 

1.7828527590 

0.00008 

0.0000347436 

1.9999652564 

0.60000 

0.2605766891 

1.7394233109 

0.00009 

0.0000390865 

1.9999609135 

0.70000 

0.3040061373 

1.6959938627 

0.00010 

0.0000434294 

1.9999565706 

0.80000 

0.3474355855 

1.6525644145 

0.00020 

0.0000868589 

1.9999131411 

0.90000 

0.3908650337 

1.6091349663 

0.00030 

0.0001302883 

1.9998697117 

1.00000 

0.4342944819 

1.5657055181 

0.00040 

0.0001737178 

1.9998262822 

2.00000 

0.8685889638 

1.1314110362 

0.00050 

0.0002171472 

1.9997828528 

3.00000 

1,3028834457 

2.6971165543 

0.00060 

0.0002605767 

1.9997394233 

4.00000 

1.7371779276 

2.2628220724 

0.00070 

0.0003040061 

1.9996959939 

5.00000 

2.1714724095 

3.8285275905 

0.00080 

0.0003474356 

1.9996525644 

6.00000 

2.6057668914 

3.3942331086 

0.O0090 

0.0003908650 

1.9996091350 

7.00000 

3.0400613733 

4.9599386267 

0.00100 

0.0004342945 

1.9995657055 

8.00000 

3.4743558552 

4.5256441448. 

0.00200 

0.0008685890 

1.9991314110 

9.00000 

3.9086503371 

4.0913496629 

0.00300 

0.0013028834 

1.9986971166 

10.00000 

4.3429448190 

5.6570551810 

0.00400 

0.0017371779 

1.9982628221 

20.00000 

8.6858896381 

9.3141103619 

0.00500 

0.0021714724 

1.9978285276 

30.00000 

13.0288344571 

14.9711655429 

0.00600 

0.0026057669 

1.9973942331 

40.00000 

17.3717792761 

18.6282207239 

0.00700 

0.0030400614 

1.9969599386 

50.00000 

21.7147240952 

22.2852759048 

0.00800 

0.0034743559 

1.9965256441 

60.00000 

26.0576689142 

27.9423310858 

0.00900 

0.0039086503 

1.9960913497 

70.00000 

30.4006137332 

31.5993862668 

0.01000 

0.0043429448 

1.9956570552 

80.00000 

34.7435585523 

35.2564414477 

0.02000 

0.0086858896 

1.9913141104 

90.00000 

39.0865033713 

40.9134966287 

0.03000 

0.0130288345 

1.9869711655 

100.00000 

43.4294481903 

44.5705518097 

0.04000 

0.0173717793 

1.9826282207 

200.00000 

86.8588963807 

87.1411036193 

0.05000 

0.0217147241 

1.9782852759 

300.00000 

130.2883445710 

131.7116554290 

0.06000 

0.0260576689 

1.9739423311 

400.00000 

173.7177927613 

174.282207?387 

0.07000 

0.0304006137 

1.9695993863 

500.00000 

217.1472409516 

218.8527590484 

Note  :  log  t^+i'  =  logt^  +  loge^.    Thus,  logei's.i478  ^  49.139465180. 
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Table  B 
The  Probability  Integral. 
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X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.00 

O.OCOOO 

00113 

00226 

00339 

00451 

00564 

00677 

00790 

00903 

01016 

0.01 

0.01128 

01241 

01354 

01467 

01580 

01792 

01805 

01918 

02031 

02144 

0.02 

0.02256 

02369 

02482 

02595 

02708 

02820 

02933 

03046 

03159 

03271 

0.03 

0.03384 

03497 

03610 

03722 

03835 

03948 

04060 

04173 

04286 

04398 

0.04 

0.04511 

04624 

04736 

04849 

04962 

05074 

05187 

05299 

05412 

05525 

0.05 

0.05637 

05750 

05862 

05975 

06087 

06200 

06312 

06425 

06537 

06650 

0.06 

0.06762 

06875 

06987 

07099 

07212 

07324 

07437 

07549 

07661 

07773 

0.07 

0.07886 

07998 

08110 

08223 

08335 

08447 

08559 

08671 

08784 

08896 

0.08 

0.09008 

09120 

09232 

09344 

09456 

09568 

09680 

09792 

09904 

10016 

0.09 

0.10128 

10240 

10352 

10464 

10576 

10687 

10799 

30911 

11023 

11135 

0.10 

0.11246 

11358 

11470 

11581 

11693 

11805 

11916 

12028 

12139 

12251 

0.11 

0.12362 

12474 

12585 

12697 

12808 

12919 

13031 

13142 

13253 

13365 

0.12 

0.13476 

13587 

13698 

13809 

13921 

14032 

14143 

14254 

14365 

14476 

0.13 

0.14587 

14698 

14809 

14919 

15030 

15141 

15252 

15363 

15473 

15584 

0.14 

0.15695 

15805 

15916 

16027 

16137 

16248 

16358 

16468 

16579 

16689 

0.15 

0.16800 

16910 

17020 

17130 

17241 

17351 

17461 

17571 

17681 

17791 

0.16 

0.17901 

18011 

18121 

18231 

18341 

18451 

18560 

18670 

18780 

18890 

0.17 

0.18999 

19109 

19218 

19328 

19437 

19547 

19656 

19766 

19875 

19984 

0.18 

0.20094 

20203 

20312 

20421 

20530 

20639 

20748 

20857 

20966 

21075 

0.19 

0.21184 

21293 

21402 

21510 

21619 

21728 

21836 

21945 

22053 

22162 

0.20 

0.22270 

22379 

22487 

22595 

22704 

22812 

22920 

23028 

23136 

23244 

0.21 

0.23352 

23460 

23568 

23676 

23784 

23891 

23999 

24107 

24214 

24322 

0.22 

0.24430 

24537 

24643 

24752 

24859 

24967 

25074 

25181 

25288 

25395 

0.23 

0.25502 

25609 

25716 

25823 

25930 

26037 

26144 

26250 

26357 

26463 

0  24 

0.26570 

26677 

26783 

26889 

26996 

27102 

27208 

27314 

27421 

27527 

0.25 

0.27633 

27739 

27845 

27950 

28056 

28162 

28268 

28373 

28479 

28584 

0.26 

0.28690 

2S795 

28901 

290G6 

29111 

29217 

29322 

29427 

29532 

29637 

0.27 

0.29742 

29847 

29952 

30056 

30161 

30266 

30370 

30475 

30579 

30684 

0.28 

0.30788 

30892 

30997 

31101 

31205 

31309 

31413 

31517 

31621 

31725 

0.29 

0.31828 

31922 

32036 

32139 

32243 

32546 

32450 

32553 

32656 

32760 

0.30 

0.32863 

32966 

33069 

33172 

33275 

33378 

33480 

33583 

33686 

33788 

0.31 

0.33891 

33993 

34096 

34198 

34300 

34403 

34505 

34607 

34709 

34811 

0  32 

034913 

35014 

35116 

35218 

35319 

35421 

35523 

35624 

35725 

35827 

0.33 

0.35928 

36029 

36130 

36231 

36332 

36433 

36534 

36635 

36735 

36836 

0.34 

0.36936 

37037 

37137 

37238 

37338 

37438 

37538 

37638 

37738 

37838 

0  35 

0.37938 

38038 

38138 

08237 

38337 

38436 

38536 

38635 

38735 

38834 

0.36 

0.38933 

39032 

39131 

39230 

39329 

39428 

39526 

39625 

39724 

39822 

0.37 

0.39921 

40019 

40117 

40215 

40314 

40412 

40510 

40608 

40705 

40803 

0.38 

0.40901 

40999 

41096 

41194 

41291 

41388 

41486 

41585 

41680 

41777 

0.39 

0.41874 

41971 

42068 

42164 

42261 

42358 

42454 

42550 

42647 

42743 

0.40 

0.42839 

42935 

43031 

43127 

43223 

43319 

43415 

43510 

43606 

43701 

0.41 

0.43797 

43892 

43988 

44083 

44178 

44273 

44363 

44463 

44557 

44652 

0.42 

0.44747 

44841 

44936 

45030 

45124 

45219 

45313 

45407 

45501 

45595 

0.43 

0.^5689 

45872 

45876 

45970 

46063 

46157 

46250 

46343 

46436 

46529 

0.44 

0.46623 

46715 

46808 

46901 

46994 

47086 

47179 

47271 

47364 

47456 

0.45 

0.47548 

47640 

47732 

47824 

47916 

48008 

48100 

48191 

48283 

48374 

0.46 

0.48466 

48557 

48648 

48739 

48830 

48921 

49012 

49103 

49193 

49284 

0.47 

0.49375 

49465 

49555 

49646 

49736 

49826 

49916 

50006 

50096 

50185 

0.48 

0.50275 

50365 

50454 

50543 

50633 

50722 

50811 

50900 

50989 

51078 

0.49 

0.51167 

51256 

51344 

51433 

51521 

51609 

51698 

51786 

51874 

51962 

2616 
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The  Probability  Integral. 


e-^'^dx 


) 


X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.50 

0.52050 

52138 

52226 

52313 

52401 

52488 

52576 

52663 

52750 

52837 

0.51 

0.52924 

53011 

53098 

53185 

53272 

53358 

53445 

53531 

53617 

53704 

0.52 

0.53790 

53876 

53962 

54048 

54134 

54219 

54305 

54390 

54476 

54561 

0.53 

0.54646 

54732 

54817 

54902 

54987 

55071 

55156 

55241 

55325 

55410 

0.54 

0.55494 

55578 

55662 

55746 

55830 

55914 

55998 

56082 

56165 

56249 

0.55 

0.56332 

56416 

56499 

56582 

56665 

56748 

56831 

56914 

56996 

57079 

0.56 

0.57162 

57244 

57326 

57409 

57491 

57573 

57655 

57737 

57818 

57900 

0.57 

0.57982 

58063 

58144 

58226 

58307 

58388 

58469 

58550 

58631 

58712 

0.58 

0.58792 

58873 

58953 

59034 

59114 

59194 

59274 

59354 

59434 

59514 

0.59 

0.59594 

5%73 

59753 

59832 

59912 

59991 

60070 

60149 

60228 

60307 

0.60 

0.60386 

60464 

60543 

60621 

60700 

60778 

60856 

60934 

61012 

61090 

0.61 

0.61168 

61246 

61323 

61401 

61478 

61556 

61633 

61710 

61787 

61864 

0.62 

0.61941 

62018 

62095 

62171 

62248 

62324 

62400 

62477 

62553 

62629 

0.63 

0.62705 

62780 

62856 

62932 

63007 

63083 

63158 

63233 

63309 

63384 

0.64 

0.63459 

63533 

63608 

63683 

63757 

63832 

63906 

63981 

64055 

64129 

0.65 

0.64203 

64277 

64351 

64424 

64498 

64572 

64645 

64718 

64791 

64865 

0.66 

0.64938 

65011 

65083 

65156 

65229 

65301 

65374 

65446 

65519 

65591 

0.67 

0.65663 

65735 

65807 

65878 

65950 

66022 

66093 

66165 

66236 

66307 

0.68 

0.66378 

66449 

66520 

66591 

66662 

66732 

66803 

66873 

66944 

67014 

0.69 

0.67084 

67154 

67224 

67294 

67364 

67433 

67503 

67572 

67642 

67711 

0.70 

0.67780 

67849 

67918 

67987 

68056 

68125 

68193 

68262 

68330 

68398 

0.71 

0.68467 

68535 

68603 

68671 

68738 

68806 

68874 

68941 

69009 

69076 

0  72 

0.69146 

69210 

69278 

69344 

69411 

69478 

69545 

69611 

69678 

69744 

0.73 

0.69810 

69877 

69943 

70Q09 

70075 

70140 

70206 

70272 

70337 

70403 

0.74 

0.70468 

70533 

70598 

70663 

70728 

70793 

70858 

70922 

70987 

71051 

0.75 

0.71116 

71180 

71244 

71308 

71372 

71436 

71500 

71563 

71627 

71690 

0.76 

0.71754 

71817 

71880 

71943 

72006 

72069 

72132 

72195 

72257 

72320 

0.77 

0.72382 

72444 

72507 

72569 

72631 

72693 

72755 

72816 

72878 

72940 

0.78 

0.73001 

73062 

73124 

73185 

73246 

73307 

73368 

73429 

73489 

73550 

0.79 

0.73610 

73671 

73731 

73791 

73851 

73911 

73971 

74031 

74091 

74151 

0.80 

0.74210 

74270 

74329 

74388 

74447 

74506 

•  74565 

74624 

74683 

74742 

0.81 

0.74800 

74859 

74917 

74976 

75034 

75092 

75150 

75208 

75266 

75323 

0.82 

0.75381 

75439 

75496 

75553 

75611 

75668 

75725 

75782 

75839 

75896 

0.83 

0.75952 

76009 

76066 

76122 

76178 

76234 

76291 

76347 

76403 

76459 

0.84 

076514 

76570 

76626 

76681 

76736 

76792 

76847 

76902 

76957 

77012 

0.85 

0.77067 

77122 

77176 

77231 

77285 

77340 

77394 

77448 

77502 

77556 

0.86 

0.77610 

77664 

77718 

77771 

77825 

77878 

77932 

77985 

78038 

78091 

0.87 

0.78144 

78197 

78250 

78302 

78355 

78408 

78460 

78512 

78565 

78617 

0.88 

0.78669 

78721 

78773 

78824 

78876 

78928 

78979 

79031 

79082 

79133 

0.89 

0.79184 

79235 

79286 

79337 

79388 

79439 

79489 

79540 

79590 

79641 

0  90 

C.79691 

79741 

79791 

79841 

79891 

79941 

79990 

80040 

80090 

80139 

0.91 

0.80188 

80238 

80287 

80336 

80385 

80434 

80482 

80531 

80580 

80628 

0.92 

0.80677 

80725 

80773 

80822 

80870 

80918 

80966 

81013 

81061 

81109 

0.93 

0.81156 

81204 

81251 

81299 

81346 

81393 

81440 

81487 

81534 

81580 

0.94 

0.81627 

81674 

81720 

81767 

81813 

81859 

81905 

81951 

81997 

82043 

0.95 

0.82089 

82135 

82180 

82226 

82271 

82317 

82362 

82407 

82452 

82497 

0.96 

0.82542 

82587 

82632 

82677 

82721 

82766 

82810 

82855 

82899 

82943 

0.97 

0.82987 

83031 

83075 

83119 

83162 

83206 

83250 

83293 

83337 

83380 

0.98 

0.83423 

83466 

83509 

83552 

83595 

83638 

83681 

83723 

83766 

83808 

0.99 

0.83851 

83893 

83935 

83977 

84020 

84061 

84103 

84145 

84187 

84229 
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The  Probability  Integral. 


( 

2 

0 

e-'^hlx^i 

X 

1.00 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

0.84270 

84312 

84353 

84394 

84435 

84477 

84518 

84559 

84600 

84640 

1.01 

0.84681 

84722 

84762 

84803 

84843 

84883 

84924 

84964 

85004 

85044 

1.02 

0.85084 

85124 

85163 

85203 

85243 

85282 

85322 

85361 

85400 

85439 

1.03 

0.85478 

85517 

85556 

85595 

85634 

85673 

85711 

85750 

85788 

85827 

1.04 

0.85865 

85903 

85941 

85979 

86017 

86055 

86093 

86131 

86169 

86206 

1.05 

0.86244 

86281 

86318 

86356 

86393 

86430 

86467 

86504 

86541 

86578 

1.06 

0.86614 

86651 

86688 

86724 

86760 

86797 

86833 

86869 

86905 

86941 

1.07 

0.86977 

87013 

87049 

87085 

87120 

87156 

87191 

87227 

87262 

87297 

1.08 

0.87333 

87368 

87403 

87438 

87473 

87507 

87542 

87577 

87611 

87646 

1.09 

0.87680 

87715 

87749 

87783 

87817 

87851 

87885 

87919 

87953 

87987 

1.10 

0.88021 

88054 

88088 

88121 

88155 

88188 

88221 

88254 

88287 

88320 

1.11 

0.88353 

88386 

88419 

88452 

88484 

88517 

88549 

88582 

88614 

88647 

1.12 

0.88679 

88711 

88743 

88775 

88807 

88839 

88871 

88902 

88934 

88966 

1.13 

0.88997 

89029 

89060 

89091 

89122 

89154 

89185 

89216 

89247 

89277 

1.14 

0.893C8 

89339 

89370 

89400 

89431 

89461 

89492 

89522 

89552 

89582 

1.15 

0.89612 

89642 

89672 

89702 

89732 

89762 

89792 

89821 

89851 

89880 

1.16 

0.89910 

89939 

89968 

89997 

90027 

90055 

90085 

90114 

90142 

90171 

1.17 

0.90200 

90229 

90257 

90286 

90314 

90343 

90371 

90399 

90428 

90456 

1.18 

0.90484 

90512 

90540 

90568 

90595 

90623 

90651 

90678 

90706 

90733 

1.19 

0.90761 

90788 

90815 

90843 

90870 

90897 

90924 

90951 

90978 

91005 

1.20 

0.91031 

91058 

91085 

91111 

91138 

91164 

91191 

91217 

91243 

91269 

1.21 

Q.91296 

91322 

91348 

91374 

91399 

91425 

91451 

91477 

91502 

91528 

1.22 

0.91553 

91579 

91604 

91630 

91655 

91680 

91705 

91730 

91755 

91780 

1.23 

0.91805 

91830 

91855 

91879 

91904 

91929 

91953 

91978 

92002 

92026 

124 

0.92051 

92075 

92099 

92123 

92147 

92171 

92195 

92219 

92243 

92266 

1.25 

0.92290 

92314 

92337 

92361 

92384 

92408 

92431 

92454 

92477 

92500 

1.26 

0.92524 

92547 

92570 

92593 

92615 

92638 

92661 

92684 

92706 

92729 

1.27 

0.92751 

92774 

92796 

92819 

92841 

92863 

92885 

92907 

92929 

92951 

1.28 

0.92973 

92995 

93017 

93039 

93061 

93082 

93104 

93126 

93147 

93168 

1.29 

0.93190 

93211 

93232 

93254 

93275 

93296 

93317 

93338 

93369 

93380 

1.30 

0.93401 

93422 

93442 

93463 

93484 

93504 

93525 

93545 

93556 

93586 

1.31 

0.93606 

93627 

93647 

93667 

93687 

93707 

93727 

93747 

93767 

93787 

1.32 

0.93807 

93826 

93846 

93866 

93885 

93905 

93924 

93944 

93963 

93982 

1.33 

0.94002 

94021 

94040 

94059 

94078 

94097 

94116 

94135 

94154 

94173 

1.34 

0.94191 

94210 

94229 

94247 

94266 

94284 

94303 

94321 

94340 

94358 

1.35 

0.94376 

94394 

94413 

94431 

94449 

94467 

94485 

94503 

94521 

94538 

1.36 

0.94556 

94574 

94592 

94609 

94627 

94644 

94662 

94679 

94697 

94714 

1.37 

0.94731 

94748 

94766 

94783 

94800 

94817 

94834 

94851 

94868 

94885 

1.38 

0.94902 

94918 

94935 

94952 

94968 

94985 

95002 

95018 

95035 

95051 

1.39 

0.95067 

95084 

95100 

95116 

95132 

95148 

95165 

95181 

95197 

95213 

1.40 

0.95229 

95244 

95260 

95276 

95292 

95307 

95323 

95339 

95354 

95370 

1.41 

0.95385 

95401 

95416 

95431 

95447 

95462 

95477 

95492 

95507 

95523 

1.42 

0.95538 

95553 

95566 

95582 

95597 

95612 

95627 

95642 

95656 

95671 

1.43 

0.95686 

95700 

95715 

9b729 

95744 

95758 

95773 

95787 

95801 

95815 

1.44 

0.95^30 

95844 

95858 

95872 

95886 

95900 

95914 

95928 

95942 

95956 

1.45 

0.95970 

95983 

95997 

96011 

96024 

96038 

96051 

96065 

96078 

96092 

1.46 

0.96105 

96119 

96132 

96145 

96159 

96172 

96185 

96198 

96211 

96224 

1.47 

096237 

96250 

96263 

96276 

96289 

96302 

96315 

96327 

96340 

96353 

1.48 

0.96365 

96378 

96391 

96403 

96416 

96428 

96440 

96453 

96465 

96478 

1.49 

0.96490 

96502 

96514 

96526 

96539 

96551 

96563 

96575 

96587 

96599 
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TABLES 

The  Probability  Integral. 

(-7-      e-'^'dx.) 
VvTrJo  / 


X 

0     2    4    6    8 

X 

2.00 

0     2    4    6    8 

1.50 

0.96611  96634  96658  96681  96705 

0.99532  99536  99540  99544  99548 

1.51 

0.96728  96751  96774  96796  96819 

2.01 

0.99552  99556  99560  99564  99568 

1.52 

0.96841  96864  96886  96908  96930 

2.02 

0  99572  99576  99580  99583  99587 

1.53 

0.96952  96973  96995  97016  97037 

2.03 

0.99591  99594  99598  99601  99605 

1.54 

0.97059  97080  97100  97121  97142 

2.04 

0.99609  99612  99616  99619  99622 

1.55 

0.97162  97183  97203  97223  97243 

2.05 

0.99626  99629  99633  99636  99639 

1.56 

0.97263  97283  97302  97322  97341 

2.06 

0.99642  99646  99649  99652  99655 

1.57 

0.97360  97379  97398  97417  97436 

2.07 

0.99658  99661  99664  99667  99670 

1.58 

0.97455  97473  97492  97510  97528 

2.08 

0.99673  99676  99679  99682  99685 

1.59 

0.97546  97564  97582  97600  97617 

2.09 

0.99688  99691  99694  99697  99699 

1.60 

0.97635  97652  97670  97687  97704 

2.10 

0.99702  99705  99707  99710  99713 

1.61 

0.97721  97738  97754  97771  97787 

2.11 

0.99715  99718  99721  99723  99726 

1.62 

0.97804  97820  97836  97852  97868 

2.12 

0.99728  99731  99733  99736  99738 

1.63 

0.97884  97900  97916  97931  97947 

2.13 

0.99741  99743  99745  99748  99750 

1.64 

0.97962  97977  97993  98008  98023 

2.14 

0.99753  99755  99757  99759  99762 

1.65 

0.98038  98052  98067  98082  98096 

2.15 

0.99764  99766  99768  99770  99773 

1.66 

0.98110  98125  98139  98153  98167 

2.16 

0.99775  99777  99779  99781  99783 

1.67 

0.98181  98195  98209  98222  98236 

2.17 

0.99785  99787  99789  99791  99793 

'1.68 

0.98249  98263  98276  98289  98302 

2.18 

0.99795  99797  99799  99801  99803 

1.69 

0.98315  98328  98341  98354  98366 

2.19 

0.99805  99806  99808  99810  99812 

1.70 

0.98379  98392  98404  98416  98429 

2.20 

0.99814  99815  99817  99819  99821 

1.71 

0.98441  98453  98465  98477  98489 

2.21 

0.99822  99824  99826  99827  99829 

1.72 

C.98500  98512  98524  98535  98546 

2.22 

0.99831  99832  99834  99836  99837 

1.73 

0.98558  98569  98580  98591  98602 

2.23 

0.99839  99840  99842  99843  99845 

1.74 

0.98613  98624  98635  98646  98657 

2.24 

0.99846  99848  99849  99851  99852 

1.75 

0.98667  98678  98688  98699  98709 

2.25 

0.99854  99855  99857  99858  99859 

1.76 

0.98719  98729  98739  98749  98759 

2.26 

0.99861  99862  99863  99865  99866 

1.77* 

0.98769  98779  98789  98798  98808 

2.27 

0.99867  99869  99870  99871  99873 

1.78 

0.98817  98827  98836  98846  98855 

2.28 

0.99874  99875  99876  99877  99879 

1.79 

0.98864  98873  98882  98891  98900 

2.29 

0.99880  99881  99882  99883  99885 

1.80 

0.98909  98918  98927  98935  98944 

2.30 

0.99886  99887  99888  99889  99890 

1.81 

0.98952  98961  98969  98978  98986 

2.31 

0.99891  99892  99893  99894  99896 

1.82 

0.98994  99C03  99011  99019  99027 

2.32 

0.99897  99898  99899  99900  99901 

1.83 

0.99035  99043  99050  99058  99066 

2.33 

0.99902  99903  99904  99905  99906 

1.84 

0.99074  99081  99089  99096  99104 

2.34 

0.99906  99907  99908  99909  99910 

1.85 

0.99111  99118  99126  99133  99140 

2.35 

0.99911  99912  99913  99914  99915 

1.86 

0.99147  99154  99161  99168  99175 

2.36 

0.99915  99916  99917  99918  99919 

1.87 

0.99182  99189  99196  99202  99209 

2.37 

0.9992C  99920  99921  99922  99923 

1.88 

0.99216  99222  99229  99235  99242 

2.38 

0.99924  99924  99925  99926  99927 

1.89 

0.99248  99254  99261  99267  99273 

2.39 

0.99928  99928  99929  99930  99930 

1.90 

0,99279  99285  99291  99297  99303 

2.40 

0.99931  99932  99933  99933  99934 

1.91 

0.99309  99315  99321  99326  99332 

2.41 

0.99935  99935  99936  99937  99937 

1.92 

0.99338  99343  99349  99355  99360 

2.42 

0.99938  99939  99939  99940  99940 

1.93 

0.99366  99371  99376  99382  99387 

2.43 

0.99941  99942  99942  99943  99943 

1.94 

0.99392  99397  99403  99408  99413 

2.44 

0.99944  99945  99945  99946  99946 

1.95 

0.99418  99423  99428  99433  99438 

2.45 

0.99947  99947  99948  99949  99949 

1.96 

0.99443  99447  99452  99457  99462 

2.46 

0.99950  99950  99951  99951  99952 

1.97 

0.99466  99471  99476  99480  99485 

2.47 

0.99952  99953  99953  99954  99954 

1.98 

0.99489  99494  99498  99502  99507 

2.48 

0.99955  99955  99956  99956  99957 

1.99 

0.99511  99515  99520  99524  99528 

2.49 

0.99957  99958  99958  99958  99959 

2.00 

0.99532  99536  99540  99544  99548 

2.50 

0.99959  99960  99960  99961  99961 
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X 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

2.5 

0.99959 

99961 

99963 

99965 

99967 

99969 

99971 

99972 

99974 

99975 

2.6 

0.99976 

99978 

99979 

99980 

99981 

99982 

99983 

99984 

99985 

99986 

2.7 

0.99987 

99987 

99988 

99989 

99989 

99990 

99991 

99991 

99992 

99992 

2.8 

0.99992 

99993 

99993 

99994 

99994 

99994 

99995 

99995 

99995 

99996 

2.9 

0.99996 

99996 

99956 

99997 

99997 

99997 

99997 

99997 

99997 

99998 

3.0 

0.99998 

9^998 

99998 

99998 

99998 

99998 

99998 

99998 

99999 

99999 

The  value,  7,  of  the  Probability  Integral  may  always  be  found  from  the    convergent 
series 

but  for  large  values  of  a,  the  semiconvergent  series 

x^\       2x2+ 1^2  x^/     (2xV'^'"7 
is  convenient. 
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Annuity,  196,  200,  201. 
Assumptions,  empirical,  4,  6,  10. 

Banker,  defined,  52. 

Cause,  defined,  88. 
Chance,  Banker's,  52,  53,  55. 
Chance,  games  of,  52  if. 
Chance,  Player's,  52,  53,  55. 
Chaos,  molecular,  188. 
Coefficient,  correlation,  145-9. 
—  —  defined,  146. 
Combinations,  formulae  for,  13,  14. 
Conditions,  essential,  defined,  5. 
Correlation,  strong  and  weak,  147. 
Craps,  21,51. 
Curve  fitting,  163  ff. 

Deviation,  standard,  defined,  66. 
Discrepancy,  average,  50. 
— ,  defined,  34. 
— ,  mean,  49. 
— ,  probable,  50. 
Dispersion,  defined,  66. 
— ,  normal,  68. 
■ — ,  sub-normal,  69. 
— ,  super-normal,  69. 
— ,  theorem,  67. 

Distribution,  normal,  for  velocities, 
179. 

Ellipse,  error,  142. 

— ,  probable,  143. 

Endowment,  195. 

Equations,  normal,  153,  154,  157, 

159. 
— ,  residual,  154. 
Errors,  accidental,  defined,  102. 
— ,  assumptions  for,  103,  114,  116. 
—,  average,  107, 119,  121,  123. 
— ,  constant,  defined,  102. 
-.formulae,  115,  121,  123. 
— i  fundamental,  103,  104. 


Errors,    Gaussian     law    for,    107, 

113  ff.,  152. 
— ,  —  —  — ,  in  many  variables, 

130  ff. 
— ,  mean,  107,  108,  111. 
— ,  probable,   107,   119,    121,    123, 

162. 
— ,  residual,  109,  120^4^21. 
Events,  compound,  17. 
— ,  independent,  18. 
Expectation,  defiiied,  25. 
—  of  life,  197. 

Factorial,  defined,  13. 
Fair,  defined,  26. 
Favourable,  defined,  26. 
Force,  of  interest,  195. 
— ,  of  mortality,  192. 

Gas,  assumed  properties,  171,  172, 

175. 
— ,  statistical  theory.  Chap.  X. 
Graduation  of  mortality,  statistics, 

192. 

Hyperspace,  defined,  173. 

Inequality,  Tchebycheff's,  64,  67. 
Insurance,  199,  200,  201. 
Integral,  probability,  43,  45. 
— ,  -*,  tables,  209-13. 

Law,  Maxwell's,  176  ff'. 
— ,  Poisson's,  65. 
Likely,  equally,  7-10. 

Mean,  weighted,  106,  109,  131. 

Meidian,  113. 

Mode,  113. 

Moments,  method  of,  168. 

Monte  Carlo,  56,  57,  58. 

Mortality,  force  of,  192. 
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Observations,  conditional,  162. 
— ,  doubtful,  125. 

Paradox,  Bertrand's  box,  90. 

— ,  — ,  geometrical,  75. 

— ,  Petrograd,  27,  28. 

Player,  defined,  52. 

Poker,  21,  51. 

Precision,  118,  121,  123,  162. 

Premiums,  201,  202,  208. 

Principle,  Bayes',  89,  100. 

— ,  — ,  for  future  events,  97. 

Probability,  compound,  18,  22,  23, 

24. 
— ,  defined,  1-5. 
—,  of  survival,  191,  192. 
— ,  total,  17,  24. 
Problem,  Buffon's  needle,  80,  81, 

83. 

Ratio,  correlation,  149,  150. 
Regression,  line  of,  148. 
Reserve,  for  insurance,  203. 
Risk,  30  ff. 
Roulette,  56,  57,  58. 


Ruin,  chance  of,  Chap.  Ill,  §  4. 
— ,  defined,  52. 

Series,  Bernoulli,  68,  71. 

— ,  Lexis,  69,  71. 

— ,  Poisson,  40,  70,  71. 

State,  normal  for  gas,  179,  180. 

Sunrise,  probability  for,  99. 

Theorem,  Bernoulli's,  22,  37,  42, 
48. 

— ,  — ,  converse  to,  93,  94. 

— ,  Duhamel's,  43. 

— ,  fundamental  dispersion,  67. 

— ,  fundamental  form,  games  of 
chance,  55. 

— ,  Laplace's,  45,  46. 

Turns,  fair,  favourable,  and  un- 
favourable, 26. 

Unfavourable,  defined,  26. 

Weights,  in  direct  measurements, 

109,  132. 
— ,  indirect  measurements,  159  if. 
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